Apparatus and method for providing a loudspeaker-enclosure-microphone system description

ABSTRACT

An apparatus for providing a current loudspeaker-enclosure-microphone system description of a loudspeaker-enclosure-microphone system is provided. The apparatus has a first transformation unit for generating a plurality of wave-domain loudspeaker audio signals. Moreover, the apparatus has a second transformation unit for generating a plurality of wave-domain microphone audio signals. Furthermore, the apparatus has a system description generator for generating the current loudspeaker-enclosure-microphone system description based on the plurality of wave-domain loudspeaker audio signals, based on the plurality of wave-domain microphone audio signals, and based on a plurality of coupling values, wherein the system description generator is configured to determine each coupling value assigned to a wave-domain pair of a plurality of wave-domain pairs by determining a relation indicator indicating a relation between a loudspeaker-signal-transformation value and a microphone-signal-transformation value.

CROSS-REFERENCE TO RELATED APPLICATION

This application is a continuation of copending InternationalApplication No. PCT/EP2012/064827, filed Jul. 27, 2012, which isincorporated herein by reference in its entirety.

BACKGROUND OF THE INVENTION

The present invention relates to audio signal processing and, inparticular, to an apparatus and method for identifying aloudspeaker-enclosure-microphone system.

Spatial audio reproduction technologies become increasingly important.Emerging spatial audio reproduction technologies, such as wave fieldsynthesis (WFS) (see [1]) or higher-order Ambisonics (see [2]) aim atcreating or reproducing acoustic wave fields that provide a perfectspatial impression of the desired acoustic scene in an extendedlistening area. Reproduction technologies like WFS or HOA provide ahigh-quality spatial impression to the listener, utilizing a largenumber of reproduction channels. To this end, typically, loudspeakerarrays with dozens to hundreds of elements are used. The combination ofthese techniques with spatial recording systems opens up new fields ofapplications such as immersive telepresence and natural acoustichuman/machine interaction. To obtain a more immersive user experience,such reproduction systems may be complemented by a spatial recordingsystem to approach new application fields or to improve the reproductionquality. The combination of the loudspeaker array, the enclosing roomand the microphone array is referred to asloudspeaker-enclosure-microphone system and is identified in manyapplication scenarios by observing the present loudspeaker andmicrophone signals. As an example, the local acoustic scene in a room isoften recorded in a room where another acoustic scene is played back bya reproduction system.

However, the desired microphone signals of the local acoustic scenecannot be observed without the echo of the loudspeakers in suchscenarios. In a teleconference, the resulting signals would annoy thefar-end party [3], while a speech recognizer in a voice-basedhuman/machine front end will generally exhibit poor recognition rates[4]. Acoustic echo cancellation (AEC) is commonly used to remove theunwanted loudspeaker echo from the recorded microphone signals whilepreserving the desired signals of the local acoustic scene withoutquality degradation. To this end, the loudspeaker-enclosure-microphonesystem (LEMS) is modeled by an adaptive filter which produces anestimate of the loudspeaker echos contained in the microphone signalswhich is subtracted from the actual microphone signals. This taskcomprises an identification of the LEMS, ideally leading to a uniquesolution. In the following, the term LEMS refers to a MIMO LEMS(Multiple-Input Multiple-Output LEMS).

AEC is significantly more challenging in the case of multichannel (MC)reproduction compared to the single-channel case, because thenonuniqueness problem [5] will generally occur: Due to the strongcross-correlation between the loudspeaker signals (e.g., those for theleft and the right channel in a stereo setup), the identificationproblem is ill-conditioned and it may not be possible to uniquelyidentify the impulse responses of the corresponding LEMSs [6]. Thesystem identified instead, denotes only one of infinitely many solutionsdefined by the correlation properties of the loudspeaker signals.Therefore the true LEMS is only incompletely identified. Thenonuniqueness problem is already known from the stereophonic AEC (see,e.g. [6]) and becomes severe for massive multichannel reproductionsystems like, e. g., wavefield synthesis systems.

An incompletely identified system still describes the behavior of thetrue LEMS for the present loudspeaker signals and may therefore be usedfor different adaptive filtering applications, although the identifiedimpulse responses may differ from the true impulse responses. In thecase of AEC, the obtained impulse responses describe the LEMSsufficiently well to significantly suppress the loudspeaker echo.

However, when the cross-correlation properties of the loudspeakersignals change, this is no longer true and the behavior of systemsrelying on adaptive filters may in fact be uncontrollable. When there isa change in the cross-correlation of the loudspeaker signals, abreakdown of the echo cancellation performance is the typicalconsequence. This lack of robustness constitutes a major obstacle forthe application of MCAEC. Moreover, other applications, such as listenroom equalization (also called listening room equalization) or activenoise cancellation (also called active noise control) do also rely on asystem identification and are strongly affected in a similar way.

To increase robustness under these conditions, the loudspeaker signalsare often altered to achieve a decorrelation so that the true LEMS canbe uniquely identified. A decorrelation of the loudspeaker signals is acommon choice.

For this purpose, three options are known: Adding mutually independentnoise signals to the loudspeaker signals [5,7,8] different nonlinearpreprocessing [6,9] or differently time-varying filtering [10,11] foreach loudspeaker signal. Although perfect solutions are unknown, atime-varying phase modulation has been shown to be applicable even tohigh-quality audio. [11]. While the mentioned techniques should ideallynot impair the perceived sound quality, an application of theseapproaches for the mentioned reproduction techniques might not be anoptimum choice: As the loudspeaker signals for WFS and HOA areanalytically determined, time-varying filtering might significantlydistort the reproduced wave field and when aiming at high-quality audioreproduction, a listener will probably not accept the addition of noisesignals or non-linear preprocessing.

There might be scenarios where an alteration of the loudspeaker signalsis unwanted or impractical. An example is given by WFS, where theloudspeaker signals are determined according to the underlying theoryand a deviation in phase would distort the reproduced wave field.Another example is the extension of reproduction systems, where theloudspeaker signals are observable, but cannot be altered. However, insuch cases it is still possible to mitigate the consequences of thenonuniqueness problem by heuristic approaches to improve the systemdescription. Such heuristics can be based on knowledge about thetransducer positions and the resulting impulse responses of the LEMS.For a stereophonic AEC in a symmetric array setup this was proposed byShimauchi et al. [12], assuming that the symmetric array setup resultsin a symmetry of the impulse responses for the correspondingloudspeaker-to-microphone paths.

Allowing no alteration of the loudspeaker signals, it is still possibleto improve system description when the nonuniqueness problem occurs,although this possibility has barely been investigated in the past. Tothis end, knowledge of the LEMS geometry can be used to deriveadditional constraints to choose an improved solution for the systemdescription in a heuristic sense. One such approach was presented in[12] where the symmetry of a stereophonic array setup was exploitedaccordingly.

However, in [12] no solution is presented for systems with large numbersof loudspeakers and microphones, such asloudspeaker-enclosure-microphone systems.

Wave-domain adaptive filtering was proposed by Buchner et al. in 2004for various adaptive filtering tasks in acoustic signal processing,including multichannel acoustic echo cancellation (MCAEC) [13],multichannel listening room equalization [27] and multichannel activenoise control [28]. In 2008, Buchner and Spors published a formulationof the generalized frequency-domain adaptive filtering (GFDAF) algorithm[15] with application to MCAEC [14] for the use with wave-domainadaptive filtering (WDAF), however, disregarding the nonuniquenessproblem [15].

SUMMARY

According to an embodiment, an apparatus for providing a currentloudspeaker-enclosure-microphone system description of aloudspeaker-enclosure-microphone system, wherein theloudspeaker-enclosure-microphone system has a plurality of loudspeakersand a plurality of microphones, may have: a first transformation unitfor generating a plurality of wave-domain loudspeaker audio signals,wherein the first transformation unit is configured to generate each ofthe wave-domain loudspeaker audio signals based on a plurality oftime-domain loudspeaker audio signals and based on one or more of aplurality of loudspeaker-signal-transformation values, said one or moreof the plurality of loudspeaker-signal-transformation values beingassigned to said generated wave-domain loudspeaker audio signal, asecond transformation unit for generating a plurality of wave-domainmicrophone audio signals, wherein the second transformation unit isconfigured to generate each of the wave-domain microphone audio signalsbased on a plurality of time-domain microphone audio signals and basedon one or more of a plurality of microphone-signal-transformationvalues, said one or more of the plurality ofmicrophone-signal-transformation values being assigned to said generatedwave-domain loudspeaker audio signal, and a system description generatorfor generating the current loudspeaker-enclosure-microphone systemdescription based the plurality of wave-domain loudspeaker audiosignals, and based on the plurality of wave-domain microphone audiosignals, wherein the system description generator is configured togenerate the loudspeaker-enclosure-microphone system description basedon a plurality of coupling values, wherein each of the plurality ofcoupling values is assigned to one of a plurality of wave-domain pairs,each of the plurality of wave-domain pairs being a pair of one of theplurality of loudspeaker-signal-transformation values and one of theplurality of microphone-signal-transformation values, and wherein thesystem description generator is configured to determine each couplingvalue assigned to a wave-domain pair of the plurality of wave-domainpairs by determining for said wave-domain pair at least one relationindicator indicating a relation between one of the one or moreloudspeaker-signal-transformation values of said wave-domain pair andone of the microphone-signal-transformation values of said wave-domainpair to generate the loudspeaker-enclosure-microphone systemdescription.

According to another embodiment, a system may have: a plurality ofloudspeakers of a loudspeaker-enclosure-microphone system, a pluralityof microphones of the loudspeaker-enclosure-microphone system, and anapparatus for providing a current loudspeaker-enclosure-microphonesystem description of a loudspeaker-enclosure-microphone system asmentioned above, wherein the plurality of loudspeakers are arranged toreceive a plurality of loudspeaker input signals, wherein the aboveapparatus is arranged to receive the plurality of loudspeaker inputsignals, wherein the plurality of microphones are configured to record aplurality of microphone input signals, wherein the above apparatus isarranged to receive the plurality of microphone input signals, andwherein the above apparatus is configured to adjust aloudspeaker-enclosure-microphone system description based on thereceived loudspeaker input signals and based on the received microphoneinput signals.

According to another embodiment, a system for generating filteredloudspeaker signals for a plurality of loudspeakers of aloudspeaker-enclosure-microphone system may have: a filter unit, and anapparatus for providing a current loudspeaker-enclosure-microphonesystem description of a loudspeaker-enclosure-microphone system asmentioned above, wherein the above apparatus is configured to provide acurrent loudspeaker-enclosure-microphone system description of theloudspeaker-enclosure-microphone system to the filter unit, wherein thefilter unit is configured to adjust a loudspeaker signal filter based onthe current loudspeaker-enclosure-microphone system description toobtain an adjusted filter, wherein the filter unit is arranged toreceive a plurality of loudspeaker input signals, and wherein the filterunit is configured to filter the plurality of loudspeaker input signalsby applying the adjusted filter on the loudspeaker input signals toobtain the filtered loudspeaker signals.

According to still another embodiment, a method for providing a currentloudspeaker-enclosure-microphone system description of aloudspeaker-enclosure-microphone system, wherein theloudspeaker-enclosure-microphone system has a plurality of loudspeakersand a plurality of microphones, may have the steps of: generating aplurality of wave-domain loudspeaker audio signals by generating each ofthe wave-domain loudspeaker audio signals based on a plurality oftime-domain loudspeaker audio signals and based on one or more of aplurality of loudspeaker-signal-transformation values, said one or moreof the plurality of loudspeaker-signal-transformation values beingassigned to said generated wave-domain loudspeaker audio signal,generating a plurality of wave-domain microphone audio signals bygenerating each of the wave-domain microphone audio signals based on aplurality of time-domain microphone audio signals and based on one ormore of a plurality of microphone-signal-transformation values, said oneor more of the plurality of microphone-signal-transformation valuesbeing assigned to said generated wave-domain loudspeaker audio signal,and generating the current loudspeaker-enclosure-microphone systemdescription based the plurality of wave-domain loudspeaker audiosignals, and based on the plurality of wave-domain microphone audiosignals, wherein the loudspeaker-enclosure-microphone system descriptionis generated based on a plurality of coupling values, wherein each ofthe plurality of coupling values is assigned to one of a plurality ofwave-domain pairs, each of the plurality of wave-domain pairs being apair of one of the plurality of loudspeaker-signal-transformation valuesand one of the plurality of microphone-signal-transformation values, andwherein each coupling value assigned to a wave-domain pair of theplurality of wave-domain pairs is determined by determining for saidwave-domain pair at least one relation indicator indicating a relationbetween one of the one or more loudspeaker-signal-transformation valuesof said wave-domain pair and one of the microphone-signal-transformationvalues of said wave-domain pair to generate theloudspeaker-enclosure-microphone system description.

According to another embodiment, a method for determining at least twofilter configurations of a loudspeaker signal filter for at least twodifferent loudspeaker-enclosure-microphone system states, wherein theloudspeaker signal filter is arranged to filter a plurality ofloudspeaker input signals to obtain a plurality of filtered loudspeakersignals for steering a plurality of loudspeakers of aloudspeaker-enclosure-microphone system, may have the steps of:determining a first loudspeaker-enclosure-microphone system descriptionof a loudspeaker-enclosure-microphone system according to the abovemethod for providing a current loudspeaker-enclosure-microphone systemdescription of a loudspeaker-enclosure-microphone system, when theloudspeaker-enclosure-microphone system has a first state, determining afirst filter configuration of the loudspeaker signal filter based on thefirst loudspeaker-enclosure-microphone system description, storing thefirst filter configuration in a memory, determining a secondloudspeaker-enclosure-microphone system description of theloudspeaker-enclosure-microphone system according to the above method,when the loudspeaker-enclosure-microphone system second a second state,determining a second filter configuration of the loudspeaker signalfilter based on the second loudspeaker-enclosure-microphone systemdescription, and storing the second filter configuration in the memory.

Another embodiment may have a computer program for implementing theabove method for providing a current loudspeaker-enclosure-microphonesystem description of a loudspeaker-enclosure-microphone system or theabove method for determining at least two filter configurations of aloudspeaker signal filter for at least two differentloudspeaker-enclosure-microphone system states when being executed by acomputer or processor.

Embodiments provide a wave-domain representation for the LEMS, where therelative weights of the true mode couplings depict a predictablestructure to a certain extend. An adaptive filter is used, where theadaptation algorithm for adapting the LEMS identification is modified ina way such that the mode coupling weights of the identified LEMS showthe same structure as it can be expected for the true LEMS representedin the wave-domain. A wave-domain representation is characterized byusing fundamental solutions of the wave-equation as basis functions forthe loudspeaker and microphone signals.

In embodiments, concepts for multichannel Acoustic Echo Cancellation(MCAEC) systems are provided, which maintain robustness in the presenceof the nonuniqueness problem without altering the loudspeaker signals.To this end, wave-domain adaptive filtering (WDAF) concepts are providedwhich use solutions of the wave equation as basis functions for atransform domain for the adaptive filtering. Consequently, theconsidered signal representations can be directly interpreted in termsof an ideally reproduced wave field and an actually reproduced wavefield within the loudspeaker-enclosure-microphone system (LEMS). Usingthe fact that the relation between these two wave fields is predictableto a certain extent, additional nonrestrictive assumptions for animproved system description in the wave domain are provided. Theseassumptions are used to provide a modified version of the generalizedfrequency-domain adaptive filtering algorithm which was previouslyintroduced for MCAEC. Moreover, a corresponding algorithm along with thenecessitated transforms and the results of an experimental evaluationare provided.

Embodiments provide concepts to mitigate the consequences of thenonuniqueness problem by using WDAF with a modified version of the GFDAFalgorithm presented in [14]. The system description in the wave domainaccording to the provided embodiment leads to an increased robustness tothe nonuniqueness problem. In embodiments, a wave-domain model isprovided which reveals predictable properties of the LEMS. It can beshown that this approach significantly improves the robustness of an AECfor reproduction systems with many reproduction channels. Major benefitswill also result for other applications by applying the proposedconcepts. According to embodiments, predictable wave-domain propertiesare provided to improve the system description when the nonuniquenessproblem occurs. This can significantly increase the robustness tochanging correlation properties of the loudspeaker signals, while theloudspeaker signals themselves are not altered. Any techniquenecessitating a MIMO system description with a large number ofreproduction channels can benefit from the provided embodiments. Notableexamples are active noise control (ANC), AEC and listening roomequalization.

BRIEF DESCRIPTION OF THE DRAWINGS

Embodiments of the present invention will be explained with reference tothe drawings, in which:

FIG. 1a illustrates an apparatus for identifying aloudspeaker-enclosure-microphone system according to an embodiment,

FIG. 1b illustrates an apparatus for identifying aloudspeaker-enclosure-microphone system according to another embodiment,

FIG. 2 illustrates a loudspeaker and microphone setup used in the LEMSto be identified, wherein the z=0 plane is depicted in cylindricalcoordinates,

FIG. 3 illustrates a block diagram of a WDAF AEC system. G_(RS)illustrates a reproduction system, H illustrates a LEMS, T₁,T₂, and T₂⁻¹ illustrate transforms to and from the wave domain, and {tilde over(H)}(n) illustrates an adaptive LEMS model in the wave domain,

FIG. 4 illustrates logarithmic magnitudes (absolute values) ofH_(μ,λ)(jω) and {tilde over (H)}_(m′,1′)(jω) in dB with μ=0, . . . ,N_(M)−1, λ=0, . . . , N_(L)−1, and m′=−4, . . . , 5, l′=−23, . . . , 24,for different frequencies ω=2πf,f=1 kHz, 2 kHz, 4 kHz normalized to themaximum of the subfigures in each row,

FIG. 5 is an exemplary illustration of mode coupling weights andadditionally introduced cost. Illustration (a) of FIG. 5 depicts weightsof couplings of the wave field components for the true LEMS {tilde over(H)}_(m,l)(jω) illustration (b) of FIG. 5 depicts the additional costintroduced by formula (4), and illustration (c) of FIG. 5 depicts theresulting weights of the identified LEMS Ĥ_(m,l)(jω),

FIG. 6a shows an exemplary loudspeaker and microphone setup used for ANCaccording to an embodiment,

FIG. 6b illustrates a block diagram of an ANC system according to anembodiment,

FIG. 6c illustrates a block diagram of an LRE system according to anembodiment,

FIG. 6d illustrates an algorithm of a signal model of an LRE systemaccording to an embodiment,

FIG. 6e illustrates a signal model for the Filtered-X GFDAF according toan embodiment,

FIG. 6f illustrates a system for generating filtered loudspeaker signalsfor a plurality of loudspeakers of a loudspeaker-enclosure-microphonesystem according to an embodiment,

FIG. 6g illustrates a system for generating filtered loudspeaker signalsfor a plurality of loudspeakers of a loudspeaker-enclosure-microphonesystem according to an embodiment showing more details,

FIG. 7 illustrates ERLE and the normalized misalignment (NMA) for afirst WDAF AEC according to the state of the art and for a second WDAFAEC according to an embodiment.

FIG. 8 illustrates ERLE and the normalized misalignment (NMA) for a WDAFAEC with a suboptimal initialization value S(0), and

FIG. 9 illustrates ERLE and the normalized misalignment (NMA) for a WDAFAEC in the presence of short interfering signals, wherein theinterferers are present at t=5 s and t=15 s for 50 ms, and wherein att=25 s the incidence angle of the synthesized plane wave was changed.

DETAILED DESCRIPTION OF THE INVENTION

FIG. 1a illustrates an apparatus for providing a currentloudspeaker-enclosure-microphone system description of aloudspeaker-enclosure-microphone system according to an embodiment. Inparticular, an apparatus for providing a currentloudspeaker-enclosure-microphone system description ({tilde over(H)}(n)) of a loudspeaker-enclosure-microphone system is provided. Theloudspeaker-enclosure-microphone system comprises a plurality ofloudspeakers (110; 210; 610) and a plurality of microphones (120; 220;620).

The apparatus comprises a first transformation unit (130; 330; 630) forgenerating a plurality of wave-domain loudspeaker audio signals ({tildeover (x)}₀(n), . . . {tilde over (x)}_(l)(n), . . . , {tilde over(x)}_(N) _(L) ⁻¹(n)), wherein the first transformation unit (130; 330;630) is configured to generate each of the wave-domain loudspeaker audiosignals ({tilde over (x)}₀(n), . . . {tilde over (x)}_(l)(n), . . . ,{tilde over (x)}_(N) _(L) ⁻¹(n)) based on a plurality of time-domainloudspeaker audio signals (x₀(n), . . . x_(λ)(n), . . . , x_(N) _(L)⁻¹(n)) and based on one or more of a plurality ofloudspeaker-signal-transformation values (l; l′), said one or more ofthe plurality of loudspeaker-signal-transformation values (l; l′) beingassigned to said generated wave-domain loudspeaker audio signal.

Moreover, the apparatus comprises a second transformation unit (140;340; 640) for generating a plurality of wave-domain microphone audiosignals ({tilde over (d)}₀(n), . . . {tilde over (d)}_(m)(n), . . . ,{tilde over (d)}_(N) _(M) ⁻¹(n)), wherein the second transformation unit(330) is configured to generate each of the wave-domain microphone audiosignals ({tilde over (d)}₀(n), . . . {tilde over (d)}_(m)(n), . . . ,{tilde over (d)}_(N) _(M) ⁻¹(n)) based on a plurality of time-domainmicrophone audio signals (d₀(n), . . . d_(μ)(n), . . . , d_(N) _(M)⁻¹(n)) and based on one or more of a plurality ofmicrophone-signal-transformation values (m, m′), said one or more of theplurality of microphone-signal-transformation values (m; m′) beingassigned to said generated wave-domain loudspeaker audio signal.

Furthermore, the apparatus comprises a system description generator(150) for generating the current loudspeaker-enclosure-microphone systemdescription based the plurality of wave-domain loudspeaker audio signals({tilde over (x)}₀(n), . . . {tilde over (x)}_(l)(n), . . . , {tildeover (x)}_(N) _(L) ⁻¹(n)), and based on the plurality of wave-domainmicrophone audio signals ({tilde over (d)}₀(n), . . . {tilde over(d)}_(m)(n), . . . , {tilde over (d)}_(N) _(M) ⁻¹(n))

The system description generator (150) is configured to generate theloudspeaker-enclosure-microphone system description based on a pluralityof coupling values, wherein each of the plurality of coupling values isassigned to one of a plurality of wave-domain pairs, each of theplurality of wave-domain pairs being a pair of one of the plurality ofloudspeaker-signal-transformation values (l; l′) and one of theplurality of microphone-signal-transformation values (m; m′).

Moreover, the system description generator (150) is configured todetermine each coupling value assigned to a wave-domain pair of theplurality of wave-domain pairs by determining for said wave-domain pairat least one relation indicator indicating a relation between one of theone or more loudspeaker-signal-transformation values of said wave-domainpair and one of the microphone-signal-transformation values of saidwave-domain pair to generate the loudspeaker-enclosure-microphone systemdescription.

FIG. 1b illustrates an apparatus for providing a currentloudspeaker-enclosure-microphone system description of aloudspeaker-enclosure-microphone system according to another embodiment.The loudspeaker-enclosure-microphone system comprises a plurality ofloudspeakers and a plurality of microphones.

A plurality of time-domain loudspeaker audio signals x₀(n), . . . ,x_(λ)(n), . . . , x_(N) _(L) ⁻¹(n) are fed into a plurality ofloudspeakers 110 of a loudspeaker-enclosure-microphone system (LEMS).The plurality of time-domain loudspeaker audio signals x₀(n), . . . ,x_(λ)(n), . . . , x_(N) _(L) ⁻¹(n) is also fed into a firsttransformation unit 130. Although, for illustrative purposes, only threetime-domain loudspeaker audio signals are depicted in FIG. 1b, it isassumed that all loudspeakers of the LEMS are connected to time-domainloudspeaker audio signals and these time-domain loudspeaker audiosignals are also fed into the first transformation unit 130.

The apparatus comprises a first transformation unit 130 for generating aplurality of wave-domain loudspeaker audio signals {tilde over (x)}₀(n),. . . {tilde over (x)}_(l)(n), . . . , {tilde over (x)}_(N) _(L) ⁻¹(n),wherein the first transformation unit 130 is configured to generate eachof the wave-domain loudspeaker audio signals {tilde over (x)}₀(n), . . .{tilde over (x)}_(l)(n), . . . , {tilde over (x)}_(N) _(L) ⁻¹(n), basedon the plurality of time-domain loudspeaker audio signals x₀(n), . . . ,x_(λ)(n), . . . , x_(N) _(L) ⁻¹(n) and based on one of a plurality ofloudspeaker-signal-transformation mode orders (not shown). In otherwords: The mode order employed determines how the first transformationunit 130 conducts the transformation to obtain the corresponding wavedomain loudspeaker audio signal. The loudspeaker-signal-transformationmode order employed is a loudspeaker-signal-transformation value.

Furthermore, the plurality of microphones 120 of the LEMS record aplurality of time-domain microphone audio signals d₀(n), . . . ,d_(μ)(n), . . . , d_(N) _(M) ⁻¹(n), Although, for illustrative purposes,only three time-domain audio signals d₀(n), . . . , d_(μ)(n), . . . ,d_(N) _(M) ⁻¹(n) recorded by three microphones 120 of the LEMS areshown, it is assumed that each microphone 120 of the LEMS records atime-domain microphone audio signal and all these microphone audiosignals are fed into a second transformation unit 140.

The second transformation unit 140 is adapted to generate a plurality ofwave-domain microphone audio signals {tilde over (d)}₀(n), . . . {tildeover (d)}_(m)(n), . . . , {tilde over (d)}_(N) _(M) ⁻¹(n), wherein thesecond transformation unit 140 is configured to generate each of thewave-domain microphone audio signals {tilde over (d)}₀(n), . . . {tildeover (d)}_(m)(n), . . . , {tilde over (d)}_(N) _(M) ⁻¹(n) based on aplurality of time-domain microphone audio signals d₀(n), . . . ,d_(μ)(n), . . . , d_(N) _(M) ⁻¹(n) and based on one of a plurality ofmicrophone-signal-transformation mode orders (not shown). In otherwords: The mode order employed determines how the second transformationunit 140 conducts the transformation to obtain the corresponding wavedomain microphone audio signal. The microphone-signal-transformationmode order employed is a microphone-signal-transformation value.

Furthermore, the apparatus comprises a system description generator 150.The system description generator 150 comprises a system descriptionapplication unit 160, an error determiner 170 and a system descriptiongeneration unit 180.

The system description application unit 160 is configured to generate aplurality of wave-domain microphone estimation signals {tilde over(y)}₀(n), . . . , {tilde over (y)}_(m)(n), . . . , {tilde over (y)}_(N)_(M) ⁻¹(n) based on the wave-domain loudspeaker audio signals {tildeover (x)}₀(n), . . . {tilde over (x)}_(l)(n), . . . , {tilde over(x)}_(N) _(L) ⁻¹(n) and based on a previousloudspeaker-enclosure-microphone system description of theloudspeaker-enclosure-microphone system.

The error determiner 170 is configured to determine a plurality ofwave-domain error signals {tilde over (d)} ₀(n), . . . {tilde over (d)}_(m)(n), . . . , {tilde over (d)} _(N) _(M) ⁻¹(n) based on the pluralityof wave-domain microphone audio signals {tilde over (d)}₀(n), . . .{tilde over (d)}_(m)(n), . . . , {tilde over (d)}_(N) _(M) ⁻¹(n) andbased on the plurality of wave-domain microphone estimation signals{tilde over (y)}₀(n), . . . , {tilde over (y)}_(m)(n), . . . , {tildeover (y)}_(N) _(M) ⁻¹(n).

The system description generation unit 180 is configured to generate thecurrent loudspeaker-enclosure-microphone system description based on thewave-domain loudspeaker audio signals {tilde over (x)}₀(n), . . . {tildeover (x)}_(l)(n), . . . , {tilde over (x)}_(N) _(L) ⁻¹(n) and based onthe plurality of error signals {tilde over (d)} ₀(n), . . . {tilde over(d)} _(m)(n), . . . , {tilde over (d)} _(N) _(M) ⁻¹(n).

The system description generation unit 180 is configured to generate theloudspeaker-enclosure-microphone system description based on a firstcoupling value β₁ of the plurality of coupling values, when a firstrelation value indicating a first difference between a firstloudspeaker-signal-transformation mode order l of the plurality ofloudspeaker-signal mode orders (l; l′) and a firstmicrophone-signal-transformation mode order m of the plurality ofmicrophone-signal mode orders (m; m′) has a first difference value.Moreover, the system description generation unit 180 is configured toassign the first coupling value β₁ to a first wave-domain pair of theplurality of wave-domain pairs, when the first relation value has thefirst difference value. In this context, the first wave-domain pair is apair of the first loudspeaker-signal mode order and the firstmicrophone-signal mode order, and wherein the first relation value isone of the plurality of relation indicators.

Furthermore, the system description generation unit 180 is configured togenerate the loudspeaker-enclosure-microphone system description basedon a second coupling value β₂ of the plurality of coupling values, whena second relation value indicating a second difference between a secondloudspeaker-signal-transformation mode order l of the plurality ofloudspeaker-signal-transformation mode orders l and a secondmicrophone-signal-transformation mode order m of the plurality ofmicrophone-signal-transformation mode orders m has a second differencevalue, being different from the first difference value. Moreover, thesystem description generation unit 180 is configured to assign thesecond coupling value β₂ to the second wave-domain pair of the pluralityof wave-domain pairs, when the second relation value has the seconddifference value. In this context, the second wave-domain pair is a pairof the second loudspeaker-signal mode order of the plurality ofloudspeaker-signal mode orders and the second microphone-signal modeorder of the plurality of microphone-signal mode orders, wherein thesecond wave-domain pair is different from the first wave-domain pair,and wherein the second relation value is one of the plurality ofrelation indicators.

An example for coupling values is, for example provided in formula (60)below, wherein c_(q)(n) are coupling values. In particular, in formula(60), β₁ is a first coupling value, β₂ is a second coupling value, and lis a third coupling value.

See formula (60):

$\begin{matrix}{{c_{q}(n)} = \left\{ \begin{matrix}\beta_{1} & {{{{when}\mspace{14mu}\Delta\;{m(q)}} = 0},} \\\beta_{2} & {{{{when}\mspace{14mu}\Delta\;{m(q)}} = 1},} \\1 & {{elsewhere},}\end{matrix} \right.} & (60)\end{matrix}$

An example for relation indicators is provided in formulae (60) andformulae (61) below, wherein Δm(q) represents relation indicators. Inparticular, a first relation value being a relation indicator may havethe value Δm(q)=0 and a second relation value being a relation indicatormay have the value Δm(q)=1.

As can be seen in formula (61) below, the relation values represented byΔm(q) indicates a relation between one of the one or moreloudspeaker-signal-transformation values and one of the one or moremicrophone-signal-transformation values, e.g. a relation between theloudspeaker-signal-transformation mode order l and themicrophone-signal-transformation mode order m. In particular, Δm(q)represents a difference of the mode orders l′ and m′.

See formula (61):Δm(q)=min(|└q/L_(H)┘−m|,|└q/L_(H)┘−N_(L))  (61)wherein the microphone-signal-transformation mode order is m, andwherein the loudspeaker-signal-transformation mode order I is definedby:l=└q/L_(H)┘

As can be seen in formulae (60) and (61), when the absolute differencebetween the third loudspeaker-signal-transformation mode order(1=q/L_(H)) and the third microphone-signal-transformation mode order(m) is greater than the predefined threshold value (here: greater than1.0), then the coupling value is a third value (1.0), being differentfrom the first coupling value (β₁) and the second coupling value (β₂).

The coupling value determined by employing formulae (60) and (61) maythen, for example be employed in formula (58):{tilde over (h)} _(m)(n)={tilde over (h)} _(m)(n−1)+(1−λ_(a))(S(n)+C_(m)(n))⁻¹·(W ₁₀ ^(H) X ^(H)(n)W ₁₀ ^(H){tilde over (e)}_(m)(n)−C_(m)(n){tilde over (h)} _(m)(n−1)).   (58)to obtain an updated LEMS description (see below).

For more details regarding formulae (58), (60) and (61) see theexplanations provided below.

In other embodiments, the loudspeaker-signal transformation values arenot mode orders of circular harmonics, but mode indices of sphericalharmonics, see below.

In further embodiments, the loudspeaker-signal transformation values arenot mode orders of circular harmonics, but components representing adirection of plane waves, for example {tilde over (k)}_(x), {tilde over(k)}_(y), and {tilde over (k)}_(z) explained below with reference toformula (6k).

In the following, an overview of basic concepts of embodiments isprovided.

Afterwards, a prototype will be described in general terms. Later on,embodiments are described in more detail.

At first, an overview of basic concepts of embodiments is provided.Please note that in the following l and m are used instead of l′ and m′to increase readability of the formulae.

FIG. 2 illustrates a loudspeaker and microphone setup used in the LEMSto be identified, wherein the z=0 plane is depicted in cylindricalcoordinates. A plurality of loudspeakers 210 and a plurality ofmicrophones 220 are depicted. It is assumed that the LEMS comprisesN_(L) loudspeakers and N_(M) microphones. Angle α and radius

describe polar coordinates.

FIG. 3 illustrates a block diagram of a corresponding WDAF AEC systemfor identifying a LEMS. G_(RS) (310) illustrates a reproduction system,H (320) illustrates a LEMS, T₁ (330),T₂ (340), and T₂ ⁻¹ (350)illustrate transforms to and from the wave domain, and {tilde over(H)}(n) (360) illustrates an adaptive LEMS model in the wave domain.

When considering the sound pressure P_(λ) ^((x))(jω) emitted by theloudspeaker λ and the sound pressure P_(μ) ^((d))(jω) measured bymicrophone p in the frequency domain, a LEMS can be modeled through

$\begin{matrix}{{{P_{\mu}^{(d)}\left( {j\;\omega} \right)} = {\sum\limits_{\lambda = 0}^{N_{L} - 1}{{P_{\lambda}^{(x)}\left( {j\;\omega} \right)}{H_{\mu,\lambda}\left( {j\;\omega} \right)}}}},{\mu = 0},1,\ldots\mspace{14mu},{N_{M} - 1},} & (1)\end{matrix}$where H_(μ,λ)(jω) denotes the frequency responses between all N_(L)loudspeakers and N_(M) microphones. For many applications, the LEMS hasto be identified, e.g., H_(μ,λ)(jω)∀λ, μ have to be estimated. To thisend, the present P_(λ) ^((x))(jω) and p^((d))(jω) are observed and thefilter Ĥ_(μ,λ)(jω)∀λ, μ is adapted, so that the P_(μ) ^((d))(jω) can beobtained by filtering P_(λ) ^((x))(jω). Often, the loudspeaker signalsare strongly cross-correlated, so estimating H_(μ,λ)(jω) is anunderdetermined problem and the nonuniqueness problem occurs. When theobserved signals are the only considered information, as present for thevast majority of system description approaches, this problem cannot besolved without altering the loudspeaker signals. However, even whenleaving the loudspeaker signals untouched, it is possible to exploitadditional knowledge to narrow the set of plausible estimates forH_(μ,λ)(jω), so that an estimate near the true solution can beheuristically determined. Corresponding concepts are provided in thefollowing.

Modeling the LEMS in the wave domain uses knowledge about the transducerarray geometries to exploit certain properties of the LEMS. For awave-domain model of the LEMS, the loudspeaker signals P_(λ) ^((x))(jω)and the microphone signals P_(μ) ^((d))(jω) are transformed to theirwave-domain representations. The wave-domain representation of themicrophone signals, the so-called measured wave field, describes thesound pressure measured by the microphones using fundamental solutionsof the wave equation. The wave-domain representation of the loudspeakersignals is called free-field description as it describes the wave fieldas it was ideally excited by the loudspeakers in the free-field case.This is done at the microphone positions using the same basis functionsas for the measured wave field. The class of wave-domain basis functionsincludes (but is not limited to) plane waves, spherical harmonics andcircular harmonics. For the sake of brevity, in the following, thedescription relates to circular harmonics and transform P_(λ) ^((x))(jω)to {tilde over (P)}_(l) ^((x))(jω) and P_(μ) ^((d))(jω) to {tilde over(P)}_(m) ^((d))(jω) according to [23]. Other embodiments cover planewaves, spherical harmonics.

The sound pressure P(α,

, jω) at angle α and radius

describing polar coordinates is represented according to

$\begin{matrix}{{{P\left( {\alpha,\varrho,{j\;\omega}} \right)} = {\sum\limits_{l = {- \infty}}^{\infty}{\left( {{{{\overset{\sim}{P}}_{l}^{(1)}\left( {j\;\omega} \right)}{\mathcal{H}_{l}^{(1)}\left( {\frac{\omega}{c}\varrho} \right)}} + {{{\overset{\sim}{P}}_{1}^{(2)}\left( {j\;\omega} \right)}{\mathcal{H}_{l}^{(2)}\left( {\frac{\omega}{c}\varrho} \right)}}} \right)e^{j\; l\;\alpha}}}},} & (2)\end{matrix}$where {tilde over (P)}_(l) ⁽¹⁾(jω) and {tilde over (P)}_(l) ⁽²⁾(jω) arespectra of outgoing and incoming waves, respectively. Both signalrepresentations, {tilde over (P)}_(l) ^((x))(jω) and {tilde over(P)}_(m) ^((d))(jω) result from a superposition of {tilde over (P)}_(l)⁽¹⁾(jω) and {tilde over (P)}_(l) ⁽²⁾(jω) as described in [23]. Thischoice of this basis functions was motivated by the circular array setupconsidered in [23], which is illustrated by FIG. 2. Circular harmonicsare just one example of a whole class of basis functions which can beused for a wave-domain representation. Other examples are plane waves[13], cylindrical harmonics, or spherical harmonics, as they all denotefundamental solutions of the wave equation.

Using the wave-domain signal representations, an equivalent to (1) maybe formulated by

$\begin{matrix}{{{{\overset{\sim}{P}}_{\mu}^{(d)}\left( {j\;\omega} \right)} = {\sum\limits_{l = {{N_{L}/2} + 1}}^{N_{L}/2}{{H_{m,l}\left( {j\;\omega} \right)}{{\overset{\sim}{P}}_{l}^{(x)}\left( {j\;\omega} \right)}}}},{m = {{{- N_{M}}/2} + 1}},\ldots\mspace{14mu},{N_{M}/2}} & (3)\end{matrix}$where {tilde over (H)}_(m,l)(jω) describes the coupling of mode l in{tilde over (P)}_(l) ^((x))(jω) and mode m in {tilde over (P)}_(m)^((d))(jω). An example of H_(μ,λ)(jω) and {tilde over (H)}_(m,l)(jω) foran LEMS with N_(L)=48 loudspeakers on a circle of radius R_(L)=1.5 m,N_(M)=10 microphones on a circle of radius R_(M)=0.05 m, and a real roomwith a reverberation time T₆₀ of 0.3 s is shown in FIG. 4 to illustratethe different properties of both models. While the weights ofH_(μ,λ)(jω) appear to be similar for all λ and μ, {tilde over(H)}_(m,l)(jω) shows a clearly distinguishable structure with dominant{tilde over (H)}_(m,l)(jω) for certain combinations of m and l. For awave-domain model, this structure may be formulated for any LEMS, incontrast to a conventional model, where the weights may differsignificantly, depending on the loudspeaker and microphone positions.This property has already been used to obtain an approximate model forthe LEMS to increase computational efficiency [13, 23].

Embodiments exploit this property in a different way. As the weights of{tilde over (H)}_(m,l)(jω) are predictable to a certain extent, theyallow to assess the plausibility of a particular estimate. Moreover, itis possible to modify adaptation algorithms for system description sothat estimates of {tilde over (H)}_(m,l)(jω) depicting similar weightsto the true solution are obtained. Those estimates can then be expectedto be close to the true solution. For a system description in the wavedomain without following the proposed approach, an estimate Ĥ_(m,l)(jω)would be implicitly determined for {tilde over (H)}_(m,l)(jω) byobtaining a least squares estimate for {tilde over (P)}_(m) ^((d))(jω)with a model according to (3). One possibility to realize the proposedapproach is to modify the resulting least squares cost function, whichoriginally only considered the deviation of {tilde over (P)}_(m)^((d))(jω) from its estimate. Such a modification can be the addition ofa term representing∫_(−∞) ^(∞|Ĥ) _(m,l)(jω)|²C(|m−l|)dω  (4a)with C(|m−l|) being a monotonically growing cost function for increasing|m−l| for the considered example of circular harmonics. For otherwave-domain basis functions C(|m−l|) is replaced by an appropriatefunction, possibly depending on multiple variables. Such a modificationregularizes the problem of system description in a physically motivatedmanner, but is in general independent of a possibly used regularizationof the underlying adaptation algorithm.

A minimization of the modified cost function leads to an estimateĤ_(m,l)(jω) depicting similar weights than shown for {tilde over(H)}_(m,l)(jω) in FIG. 4. An illustration of mode coupling weight andcorresponding cost is shown in FIG. 5. A modification according to (4a)is just one of several ways to implement the concepts provided byembodiments As the set of possible estimates Ĥ_(m,l)(jω) is stillunbounded, we refer to this modification as introducing anon-restrictive constraint.

Another possibility is to necessitate an estimate Ĥ_(m,l)(jω) to fulfill

$\begin{matrix}{{\int_{- \infty}^{\infty}{{{{\hat{H}}_{m,l_{1}}\left( {j\;\omega} \right)}}^{2}d\;\omega}} > {\int_{- \infty}^{\infty}{{{{\hat{H}}_{m,l_{2}}\left( {j\;\omega} \right)}}^{2}d\;\omega\;{\forall{{{l_{2} - m}} > {{l_{1} - m}}}}}}} & \left( {4b} \right)\end{matrix}$which would then be a restrictive constraint.

According to embodiments, a variety of constraints may be formulated,where (4a) and (4b) describe just two possible realizations.

In the following, a prototype is described in general terms.

The prototype of an AEC according to an embodiment is briefly describedand an excerpt of its experimental evaluation is given. AEC is commonlyused to remove the unwanted loudspeaker echo from the recordedmicrophone signals while preserving the desired signals of the localacoustic scene without quality degradation. This is necessitated to usea reproduction system in communication scenarios like teleconferencingand acoustic human-machine-interaction.

FIG. 3 illustrates a block diagram depicting the signal model of awave-domain AEC according to an embodiment. There, the continuousfrequency-domain quantities used in the previous section are representedby vectors of discrete-time signals with the block time index n. Thesignal quantities x(n) and d(n) correspond to P_(λ) ^((x))(jω) and P_(μ)^((d))(jω), respectively. Similarly, the wave-domain representation{tilde over (x)}(n) and {tilde over (d)}(n) correspond to P_(l)^((x))(jω) to P_(m) ^((d))(jω), respectively. The wave-domainrepresentation {tilde over (y)}(n) denotes an estimate for {tilde over(d)}(n) and {tilde over (e)}(n)={tilde over (d)}(n)−{tilde over (y)}(n)is the adaptation error in the wave-domain. This error is transformedback to the microphone signal domain, where it is denoted as e(n). Thetransforms T₁, T₂ and T₂ ⁻¹ denote transforms to and from the wavedomain, H corresponds to H_(μ,λ)(jω) and {tilde over (H)}(n) to itswave-domain estimate Ĥ_(m,l)(jω)

In the following, an excerpt of an experimental evaluation of thementioned AEC will be provided. To this end, the two most importantmeasures for an AEC are considered. The so-called “Echo Return LossEnhancement” (ERLE) provides a measure for the achieved echocancellation and is here defined as

$\begin{matrix}{{{{ERLE}(n)} = {{10\;{\log_{10}\left( \frac{{{\overset{\sim}{d}(n)}}_{2}^{2}}{{{\overset{\sim}{e}(n)}}_{2}^{2}} \right)}} = {10\;{\log_{10}\left( \frac{{{d(n)}}_{2}^{2}}{{{e(n)}}_{2}^{2}} \right)}}}},} & \left( {5a} \right)\end{matrix}$where ∥·∥₂ stands for the Euclidean norm. The normalized misalignment isa metric to determine the distance of the identified LEMS from the trueone, e.g., the distance of Ĥ_(m,l)(jω) and {tilde over (H)}_(m,l)(jω).For the system described here, this measure can be formulated asfollows:

$\begin{matrix}{{{\Delta_{H}(n)} = {10\;{\log_{10}\left( \frac{{{{T_{2}H} - {{\overset{\sim}{H}(n)}T_{1}}}}_{F}^{2}}{{{T_{2}H}}_{F}^{2}} \right)}}},} & \left( {5b} \right)\end{matrix}$where ∥·∥_(F) stands for the Frobenius norm.

FIG. 8 shows ERLE and normalized misalignment for the built prototype incomparison to a conventional generation of a system description. In thisscenario, two plane waves were synthesized by a WFS system, firstalternatingly and then simultaneously. Within the first five seconds thefirst plane wave with an incidence angle of φ=0 was synthesized, duringthe following five seconds, the second plane wave with an incidenceangle of φ=π/2 was synthesized. Within the last five seconds, both planewaves were simultaneously synthesized. Mutually uncorrelated white noisesignals were used as source signals for the plane waves. The consideredLEMS was already described above. The parameters for the adaptivefilters can be considered as being nearly optimal.

The most attention in this discussion is given to the normalizedmisalignment, because a lower misalignment denotes a better systemdescription. As the 48 loudspeaker signals were obtained from only twosource signals, the identification of the LEMS is a severelyunderdetermined problem. Consequently, the achieved absolute normalizedmisalignment cannot be expected to be very low. However, the AECimplementing the proposed invention shows a significant improvement. Wecan see that the adaption algorithm with the modified cost functionachieves a misalignment of −1.6 dB while the original adaptationalgorithm only achieves −0.2 dB. Please note that a value of −0.2 dB isalmost the minimal misalignment which can be expected, when onlyconsidering microphone and loudspeaker signals in such a scenario. Eventhough this experiment was conducted under optimal conditions, e.g., inabsence of noise or interferences in the microphone signal, the bettersystem description already leads to a better echo cancellation. Theanticipated breakdown of the ERLE when the activity of both plane wavesswitches is less pronounced for the modified adaptation algorithm thanfor the original approach. Moreover, the modified algorithm is able toachieve a larger steady-state ERLE, which points to the fact theconsidered original algorithm is trapped in a local minimum due to thefrequency-domain approximation [14], which is necessitated for bothalgorithms.

In practice, benevolent laboratory conditions, as described in theprevious experiment, are typically not present. One problem for thesystem description can be a double-talk situation, e.g., thesimultaneous activity of the loudspeaker signals and the local acousticscene. The adaptation of the filters is then typically stalled undersuch conditions to avoid a diverging system description. However, such asituation cannot always be reliably detected and adaptation steps duringdouble-talk may occur. Therefore, an experiment was conducted to studythe behavior of an AEC in this case. To this end, a similar scenario asin the previous experiment was considered, where the first plane wavewas synthesized during the first 25 seconds and the second plane wavewas synthesized within the last 5 seconds. To simulate an undetecteddouble-talk situation, short noise bursts we introduced into themicrophone signal, leading to approximately two mislead adaptationsteps. The results are shown in FIG. 9. Considering the misalignment itcan be seen that both algorithms are negatively affected due to thisadaptation steps. The modified adaptation algorithm can, however,recover quickly from the divergence, in contrast to the originalalgorithm. Regarding the ERLE, both algorithms show a significantbreakdown and a following recovery with every disturbance. For theoriginal algorithm, we can see that the steady-state ERLE worsens withevery recovery, while the steady-state performance of the modifiedalgorithm remains not significantly affected. When the activity of bothplane waves changes, the ERLE breakdown of the original algorithm isclearly more pronounced than for the modified algorithm.

The shown increase of robustness is expected to be also beneficial forother applications, e.g., listening room equalization.

In the following, embodiments will be provided, wherein different WDAFbasis functions will be employed. Moreover, in the following, we use{tilde over (l)}=l′ and {tilde over (m)}=m′. The explanations in thefollowing will be focused on circular harmonics, spherical harmonics andplane waves as WDAF basis functions. It should be noted that the presentinvention is equally applicable with other WDAF basis functions, suchas, for example, cylindrical harmonics.

At first, a LEMS description using different WDAF basis functions isprovided. For WDAF, the considered loudspeaker and microphone signalsare represented by a superposition of chosen basis functions which arefundamental solutions of the wave equation valuated at the microphonepositions. Consequently, the wave-domain signals describe a sound fieldwithin a spatial continuum. Each individual considered fundamentalsolution of the wave equation is referred to as a wave field componentand is uniquely identified by one or more mode orders, one or more wavenumbers or any combination thereof.

The wave-domain loudspeaker signals describe the wave field as it wasideally excited at the microphone positions in the free field casedecomposed into its wave field components. The wave-domain microphonesignals describe the sound pressure measured by the microphones in termsof the chosen basis functions.

In the wave-domain, a LEMS is described by the way it distorts thereproduced wave field with respect to the wave field which would ideallybe excited in the free field case. Consequently, this description isformulated as couplings of the wave-domain loudspeaker signals and thewave-domains microphone signals.

In the free field case, there is no distortion of the reproduced wavefield and only the wave field components of the wave domain loudspeakerand microphone signals are coupled, which share identical mode orders orwave numbers. For typical room shapes with no significant obstaclesbetween loudspeakers and microphones, the reproduced wave field is onlymoderately distorted. So the couplings between wave field components ofthe transformed loudspeaker signals and wave field components of thetransformed microphone signals which describe similar sound fields arestronger than the coupling of wave field components describing verydifferent sound fields. The difference of the sound field described bydifferent wave field components is measured by a distance function whichis described below after the review of different basis functions forWDAF.

For WDAF, different fundamental solutions of the wave equation can beused. Examples are: circular harmonics, plane waves and sphericalharmonics. Those basis functions are used to describe the sound pressureP({right arrow over (x)},jω) at the position {right arrow over (x)},here described in the continuous frequency domain, where ω is theangular frequency. Alternatively, cylindrical harmonics may be used.

At first, circular harmonics are considered. When using circularharmonics, we describe {right arrow over (x)}=(α,

)^(T) in polar coordinates with an angle α and a radius

and we obtain the following superposition to describe the sound pressureat this point

$\begin{matrix}{{P\left( {\alpha,\varrho,{j\;\omega}} \right)} = {\sum\limits_{\overset{\sim}{m} = {- \infty}}^{\infty}{\left( {{{{\overset{\sim}{P}}_{\overset{\sim}{m}}^{(1)}\left( {j\;\omega} \right)}{\mathcal{H}_{\overset{\sim}{m}}^{(1)}\left( {\frac{\omega}{c}\varrho} \right)}} + {{{\overset{\sim}{P}}_{\overset{\sim}{m}}^{(2)}\left( {j\;\omega} \right)}{\mathcal{H}_{\overset{\sim}{m}}^{(2)}\left( {\frac{\omega}{c}\varrho} \right)}}} \right)e^{j\;\overset{\sim}{m}\alpha}}}} & \left( {6a} \right)\end{matrix}$where and are spectra of outgoing and incoming waves, respectively.Here, H_({tilde over (m)}) ⁽¹⁾(x) and H_({tilde over (m)}) ⁽²⁾(x) areHankel functions of the first and second kind and order {tilde over(m)}, respectively, c is the speed of sound, and j is used as theimaginary unit. Assuming no acoustic sources in the coordinate origin,we may reduce our consideration to a superposition of incoming andoutgoing waves.

$\begin{matrix}{{P\left( {\alpha,\varrho,{j\;\omega}} \right)} = {\sum\limits_{\overset{\sim}{m} = {- \infty}}^{\infty}{{{\overset{\sim}{P}}_{\overset{\sim}{m}}^{(d)}\left( {j\;\omega} \right)}{\mathcal{B}_{\overset{\sim}{m}}\left( {j\;\omega} \right)}e^{j\;\overset{\sim}{m}\alpha}}}} & \left( {6b} \right)\end{matrix}$where B_({tilde over (m)})(jω) depends on the presence of a scattererwithin the microphone array, and is equal to the ordinary Besselfunction of the first kind I_({tilde over (m)})(jω) in the free field[19]. A single wave field component describes the contribution{tilde over (P)}_({tilde over (m)})^((d))(jω)B_({tilde over (m)})(jω)e^(j{tilde over (m)}α)  (6c)to the resulting sound field and is identified by its mode order {tildeover (m)}. So we denote the transformed microphone signals with {tildeover (P)}_({tilde over (m)}) ^((d))(jω) and the transformed loudspeakersignals with {tilde over (P)}_(l) ^((x))(jω). The wave-domain model isthen described by

$\begin{matrix}{{{\overset{\sim}{P}}_{\overset{\sim}{m}}^{(d)}\left( {j\;\omega} \right)} = {\sum\limits_{l = \infty}^{\infty}{{{\overset{\sim}{H}}_{\overset{\sim}{m},\overset{\sim}{l}}\left( {j\;\omega} \right)}{{{\overset{\sim}{P}}_{l}^{(x)}\left( {j\;\omega} \right)}.}}}} & \left( {6\; d} \right)\end{matrix}$

Now, spherical harmonics are considered. For spherical harmonics, wedescribe {right arrow over (x)}=(α, ν,

)^(T) in spherical coordinates with an azimuth angle α, a polar angle δand a radius ζ and we obtain the following superposition to describe thesound pressure at this point

$\begin{matrix}{{P\left( {\alpha,\vartheta,\varrho,{j\;\omega}} \right)} = {\sum\limits_{\overset{\sim}{n} = 0}^{\infty}{\sum\limits_{\overset{\sim}{m} = {- \overset{\sim}{n}}}^{\overset{\sim}{n}}{\left( {{{{\hat{p}}_{\overset{\sim}{m},\overset{\sim}{n}}^{(1)}\left( {j\;\omega} \right)}{h_{\overset{\sim}{n}}^{(1)}\left( {\frac{\omega}{c}\varrho} \right)}} + {{{\hat{p}}_{\overset{\sim}{m},\overset{\sim}{n}}^{(2)}\left( {j\;\omega} \right)}{h_{\overset{¨}{n}}^{(2)}\left( {\frac{\omega}{c}\varrho} \right)}}} \right){Y_{\overset{\sim}{n}}^{\overset{\sim}{m}}\left( {\partial{,a}} \right)}}}}} & \left( {6\; e} \right)\end{matrix}$Here, h_(ñ) ⁽¹⁾(x) and h_(ñ) ⁽²⁾(x) are spherical Hankel functions ofthe first and second kind and order n, respectively and the sphericalbasis functions are given by

$\begin{matrix}{{Y_{\overset{\sim}{n}}^{\overset{\sim}{m}}\left( {\vartheta,\varphi} \right)} = {\sqrt{\frac{{2\;\overset{\sim}{n}} + 1}{4\;\pi}\frac{\left( {\overset{\sim}{n} - \overset{\sim}{m}} \right)!}{\left( {\overset{\sim}{n} + \overset{\sim}{m}} \right)!}}{\mathcal{P}_{\overset{\sim}{n}}^{\overset{\sim}{m}}\left( {\cos(\vartheta)} \right)}e^{j\;\overset{\sim}{m}\;\varphi}}} & \left( {6f} \right)\end{matrix}$with the associated Legendre polynomials

$\begin{matrix}{{\mathcal{P}_{\overset{\sim}{n}}^{\overset{\sim}{m}}(z)} = {\frac{\left( {- 1} \right)^{\overset{\sim}{m}}}{2^{\overset{\sim}{n}}{\overset{\sim}{n}!}}\left( {1 - z^{2}} \right)^{\overset{\sim}{m}/2}\frac{d^{\overset{\sim}{m} + \overset{\sim}{n}}}{d\; z^{\overset{\sim}{m} + \overset{\sim}{n}}}\left( {z^{2} - 1} \right)^{\overset{\sim}{n}}}} & \left( {6g} \right)\end{matrix}$for {tilde over (m)}≥0. For negative {tilde over (m)}, the associatedLegendre polynomials are defined by

$\begin{matrix}{{\mathcal{P}_{\overset{\sim}{n}}^{- \overset{\sim}{m}}(z)} = {\left( {- 1} \right)^{\overset{\sim}{n}}\frac{\left( {\overset{\sim}{n} - \overset{\sim}{m}} \right)!}{\left( {\overset{\sim}{n} + \overset{\sim}{m}} \right)!}{\mathcal{P}_{\overset{\sim}{n}}^{\overset{\sim}{m}}(z)}}} & \left( {6\; h} \right)\end{matrix}$

As it can be seen from formula (6e) to (6g), the spherical harmonics areidentified by two mode order indices {tilde over (m)} and ñ. Again,{tilde over (p)}_({tilde over (m)},ñ) ⁽¹⁾(jω) and {tilde over(p)}_({tilde over (m)},ñ) ⁽²⁾(jω) describe spectra of incoming andoutgoing waves with respect to the origin and we consider thesuperposition of both. So each spherical harmonic wave field componentdescribes a contribution to the sound field according to

$\begin{matrix}{{{{\hat{p}}_{\overset{\sim}{m},\overset{\sim}{n}}^{(d)}\left( {j\;\omega} \right)}{b_{\overset{\sim}{n}}\left( {\frac{\omega}{c}\varrho} \right)}{Y_{\overset{\sim}{n}}^{\overset{\sim}{m}}\left( {\theta,\alpha} \right)}},} & \left( {6i} \right)\end{matrix}$where

$b_{\overset{\sim}{n}}\left( {\frac{\omega}{c}\varrho} \right)$is dependent on the boundary conditions at the coordinate origin,similar to

$\mathcal{B}_{\overset{\sim}{m}}\left( {\frac{\omega}{c}\varrho} \right)$for the circular harmonics. So we denote the transformed microphonesignals with {tilde over (p)}_({tilde over (m)},ñ) ^((d))(jω) and thetransformed loudspeaker signals with {tilde over(p)}_({tilde over (l)},{tilde over (k)}) ^((x))(jω). The wave-domainmodel is then described by

$\begin{matrix}{{{{\overset{.}{p}}_{\overset{\sim}{m},\overset{\sim}{n}}^{(d)}\left( {j\;\omega} \right)} = {\sum\limits_{\overset{\sim}{k} = 0}^{\infty}{\sum\limits_{\overset{\sim}{l} = {- \overset{\sim}{k}}}^{\overset{\sim}{k}}{{{\overset{.}{H}}_{\overset{\sim}{m},\overset{\sim}{n},\overset{\sim}{l},\overset{\sim}{k}}\left( {j\;\omega} \right)}{{\overset{.}{p}}_{\overset{\sim}{l},\overset{\sim}{k}}^{(x)}\left( {j\;\omega} \right)}}}}},} & \left( {6j} \right)\end{matrix}$

Now, plane waves are considered. For a plane wave signal representationin the wave domain, we describeP(x,y,z,jω)=∫_(−∞) ^(∞)∫_(−∞) ^(∞)∫_(−∞) ^(∞){tilde over (P)}({tildeover (k)}_(x),{tilde over (k)}_(y),{tilde over(k)}_(z))e^(−j(x{tilde over (k)}) ^(x) ^(,y{tilde over (k)}) ^(y)^(,z{tilde over (k)}) ^(z) ⁾d{tilde over (k)}_(z)d{tilde over(k)}_(y)d{tilde over (k)}_(x))  (6k)where {tilde over (P)}({tilde over (k)}_(x), {tilde over (k)}_(y),{tilde over (k)}_(z), jω) describes the plane wave representation of thesound field and is only non-zero if

${{\overset{\sim}{k}}_{x}^{2} + {\overset{\sim}{k}}_{y}^{2} + {\overset{\sim}{k}}_{z}^{2}} = {\frac{\omega^{2}}{c^{2}}.}$

Now, model discretization is described. The number of componentsdescribing a real-world sound field is typically not limited. However,for a realization of an adaptive filter, we have to restrict ourconsiderations to a subset of all available wave field components. Forcircular harmonics, this is simply done by limiting the considered modeorder |ñ|. When using plane waves, {tilde over (k)}_(x), {tilde over(k)}_(y), and {tilde over (k)}_(z) describe continuous values incontrast to the integer mode orders of circular or spherical harmonics.Furthermore, {tilde over (k)}_(x), {tilde over (k)}_(y), and {tilde over(k)}_(z) are bounded by

${{\overset{\sim}{k}}_{x}^{2} + {\overset{\sim}{k}}_{y}^{2} + {\overset{\sim}{k}}_{z}^{2}} = {\frac{\omega^{2}}{c^{3}}.}$Consequently, they are discretized within their boundaries. Consideringonly plane waves traveling in the x-y-plane, an example of such adiscretization can be

$\begin{matrix}{{\begin{pmatrix}{\overset{\sim}{k}}_{x} \\{\overset{\sim}{k}}_{y} \\{\overset{\sim}{k}}_{z}\end{pmatrix} = \begin{pmatrix}{\frac{\omega}{c}{\cos(\varphi)}} \\{\frac{\omega}{c}{\sin(\varphi)}} \\0\end{pmatrix}},{\varphi = \frac{p\; 2\;\pi}{P}},{p = 0},1,\ldots\mspace{14mu},{P - 1.}} & \left( {7\; a} \right)\end{matrix}$

The microphone signals are then described by {tilde over(P)}^((d))({tilde over (k)}_(x) ^((d)), {tilde over (k)}_(y) ^((d)),{tilde over (k)}_(z) ^((d)), jω, and the loudspeaker signals by {tildeover (P)}^((x))({tilde over (k)}_(x) ^((x)), {tilde over (k)}_(y)^((x)), {tilde over (k)}_(z) ^((x)), jω. Given a suitablediscretization, we may also describe the LEMS system by a sum

$\begin{matrix}{{{\overset{\_}{P}}^{(d)}\left( {{\overset{\sim}{k}}_{x}^{(d)},{\overset{\sim}{k}}_{y}^{(d)},{\overset{\sim}{k}}_{z}^{(d)},{j\;\omega}} \right)} = {\sum\limits_{{({{\overset{\sim}{k}}_{x}^{(x)},{\overset{\sim}{k}}_{y}^{(x)},{\overset{\sim}{k}}_{z}^{(x)}})} \in K}{{\overset{\_}{H}\left( {{\overset{\sim}{k}}_{x}^{(d)},{\overset{\sim}{k}}_{y}^{(d)},{\overset{\sim}{k}}_{z}^{(d)},{\overset{\sim}{k}}_{x}^{(x)},{\overset{\sim}{k}}_{y}^{(x)},{\overset{\sim}{k}}_{z}^{(x)},{j\;\omega}} \right)} \cdot {{\overset{\_}{P}}^{(x)}\left( {{\overset{\sim}{k}}_{x}^{(x)},{\overset{\sim}{k}}_{y}^{(x)},{\overset{\sim}{k}}_{z}^{(x)},{j\;\omega}} \right)}}}} & \left( {7b} \right)\end{matrix}$where the K is the set of ({tilde over (k)}_(x) ^((x)), {tilde over(k)}_(y) ^((x)), {tilde over (k)}_(z) ^((x))) considered for the modeldiscretization, for example, as described by (7a).

In the following, realizations of improved system identification fordifferent basis Functions according to embodiments are described. Inparticular, it is explained how the invention can be applied for WDAFsystems using different basis functions. As mentioned above, thedistortion of the reproduced wave field can be described by couplings ofthe wave field components in the transformed loudspeaker signals and inthe transformed microphone signals (see formulae (6d), (6j), and (7b)).The couplings of the wave field components describing similar soundfields are stronger than the couplings of wave field componentsdescribing completely different sound fields. A measure of similaritycan be given by the following functions.

For circular harmonics, we can simply use the absolute difference of themode orders given byD({tilde over (m)},{tilde over (l)})=|{tilde over (m)}−{tilde over(l)}|.  (8a)

For spherical harmonics, we have to consider two mode indices for eachwave-domain signal and obtainD({tilde over (m)},ñ,{tilde over (l)},{tilde over (k)})=|{tilde over(m)}−{tilde over (l)}|+|ñ−{tilde over (k)}|.  (8b)independently of the chosen sampling of the wave numbers.

For system identification typically, a cost function penalizing and thedifference between an estimate of the microphone signal and theirestimates is minimized. One way to realize the invention is to modify anadaptation algorithm such that the obtained weights of the wave fieldcomponent couplings are also considered. This can be done by simplyadding an additional term to the cost function which grows with anincreasing D( . . . ), resulting in∫_(−∞) ^(∞|Ĥ) _({tilde over (m)},{tilde over (l)})(jω)|²C(D({tilde over(m)},{tilde over (l)})dω  (8c)∫_(−∞) ^(∞)|{tilde over(H)}_({tilde over (m)},ñ,{tilde over (l)},{tilde over (k)})(jω)|²C(D({tildeover (m)},ñ,{tilde over (l)},{tilde over (k)})dω  (8d)∫_(−∞) ^(∞)|{tilde over (H)}(|²C(D(({tilde over (k)}_(x) ^((d)),{tildeover (k)}_(y) ^((d)),{tilde over (k)}_(z) ^((d)),{tilde over (k)}_(x)^((x)),{tilde over (k)}_(y) ^((x)),{tilde over (k)}_(z)^((x)),jω)dω  (8e)for circular harmonics, spherical harmonics and plane waves,respectively. Here, Ĥ_(m,l)(jω) represents the estimate of estimate of{tilde over (H)}_(m,l)(jω),{tilde over (H)}_(m,l)(jω),{tilde over(H)}_({tilde over (m)},ñ,{tilde over (l)},{tilde over (k)})(jω)represents the estimate of {tilde over(H)}_({tilde over (m)},ñ,{tilde over (l)},{tilde over (k)})(jω) and{tilde over (H)}({tilde over (k)}_(x) ^((d)), {tilde over (k)}_(y)^((d)), {tilde over (k)}_(z) ^((d)), {tilde over (k)}_(x) ^((x)), {tildeover (k)}_(y) ^((x)), {tilde over (k)}_(z) ^((x)), jω) represents theestimate of {tilde over (H)}({tilde over (k)}_(x) ^((d)), {tilde over(k)}_(y) ^((d)), {tilde over (k)}_(z) ^((d)), {tilde over (k)}_(x)^((x)), {tilde over (k)}_(y) ^((x)), {tilde over (k)}_(z) ^((x)), jω).The cost function C(x) is a monotonically increasing function.

In the following, the concepts on which embodiments rely, and theembodiments themselves are described in more detail.

At first, the problem of multichannel acoustic echo cancellation (MCAEC)is briefly reviewed.

AEC uses observations of loudspeaker and microphone signals to estimatethe loudspeaker echo in the microphone signals. Although extraction ofthe desired signals of the local acoustic scene is the actual motivationfor AEC, it will be assumed for the analysis that the local sources areinactive. This does not limit the applicability of the obtained results,since in most practical systems the adaptation of the filters is stalledduring activity of local desired sources (e.g. in a double-talksituation) [16]. For the actual detection of double-talk, see, e.g.,[17].

Now, the signal model is presented. The structure of a wave-domain AECaccording to FIG. 3 will be described. There are two types of signalrepresentations used in this context: so-called point observationsignals, corresponding to sound pressure measured at points in space,and wave-domain representations, corresponding to wave-field componentswhich can be observed over a continuum in space. The latter will bediscussed later on.

At first, point observation signals will be described. For block-wiseprocessing of signals, vectors of signal samples are introduced with theblock-time index n as argument. The reproduction system G_(RS) shown inFIG. 3 is not part of the AEC system, but is considered for describingthe nonuniqueness problem below.

As input for the reproduction system we have a set of N_(S) uncorrelatedsource signals {circumflex over (x)}_(s)(k) captured by{tilde over (x)}(n)=({tilde over (x)}₀ ^(T)(n), . . . {tilde over (x)}₁^(T)(n), . . . ,{tilde over (x)}_(N) _(S) ⁻¹ ^(T)(n))^(T),{tilde over (x)}_(s)(n)=({tilde over (x)}_(s)(nL_(B)−L_(S)+1),{tildeover (x)}_(s)(nL_(B)−L_(S)+2), . . . ,{tilde over(x)}_(s)(nL_(B)))^(T),s=0,1, . . . ,N_(S)−1  (9)where ·^(T) denotes the transposition, s denotes the source index, L_(B)denotes the relative block shift between data blocks, L_(S) denotes thelength of the individual components x_(s)(n), and

_(s)(k) denotes a time-domain signal sample of source s at the timeinstant k. The loudspeaker signals are then determined by thereproduction system according tox(n)=G_(RS) ^(x)(n),  (10a)where x(n) can be decomposed into{tilde over (x)}(n)=({tilde over (x)}₀ ^(T)(n), . . . {tilde over (x)}₁^(T)(n), . . . ,{tilde over (x)}_(N) _(L) ⁻¹ ^(T)(n))^(T),{tilde over (x)}_(λ)(n)=({tilde over (x)}_(λ)(nL_(B)−L_(X)+1),{tildeover (x)}_(λ)(nL_(B)−L_(X)+2), . . . ,{tilde over(x)}_(λ)(nL_(B)))^(T),λ=0,1, . . . ,N_(L)−1  (9)with the loudspeaker index λ, the number of loudspeakers N_(L), and thelength L_(X) of the individual components x_(λ)(n) which capture thetime-domain samples x_(λ)(k) of the respective loudspeaker signals. TheL_(X)·N_(L)×L_(S)·N_(S) matrix G_(RS) describes an arbitrary linearreproduction system, e.g., a WFS system, whose output signals aredescribed by

$\begin{matrix}{{{x_{\lambda}(k)} = {\sum\limits_{s = 1}^{N_{S - 1}}{\sum\limits_{\kappa = 0}^{L_{G} - 1}{{{\overset{.}{x}}_{s}\left( {k - \kappa} \right)}{g_{\lambda,s}(\kappa)}}}}},} & (11)\end{matrix}$where g_(λ,s)(k) is the impulse response of length L_(G) used by thereproduction system to obtain the contribution of source s to theloudspeaker signal λ.

The loudspeaker signals are then fed to the LEMS. The N_(M) microphonesignals are described by the vector d(n) which is given byd(n)=Hx(n),  (12a)d(n)=(d₀ ^(T)(n),d₁ ^(T)(n), . . . ,d_(N) _(M) ⁻¹ ^(T))  (12b)d_(μ)(n)=(d_(μ)(nL_(B)−L_(B)+1),d_(μ)(nL_(B)−L_(B)+2), . . .,d_(μ)(nL_(B)))^(T),μ=0,1, . . . ,N_(M)−1  (12c)where μ is the index of the microphone, d_(μ)(k) a time-domain sample ofthe microphone signal μ, and H describes the LEMS. TheL_(B)·N_(M)×L_(X)·N_(L) matrix H is structured such that

$\begin{matrix}{{{d_{\mu}(k)} = {\sum\limits_{\lambda = 1}^{N_{L}}{\sum\limits_{\kappa = 0}^{L_{H} - 1}{{x_{\lambda}\left( {k - \kappa} \right)}{h_{\mu,\lambda}(\kappa)}}}}},} & (13)\end{matrix}$where h_(μ,λ)(k) is the discrete-time impulse response of the LEMS fromloudspeaker b to microphone μ of length L_(H). During double-talk, d(n)would also contain the signal of the local acoustic scene. From (9) to(13) follow L_(X)≥L_(B)+L_(H)−1 and L_(S)=L_(X)+L_(G)−1 with the givenlengths L_(G), L_(H), and L_(B). The option to choose L_(X) larger thanL_(B)+L_(H)−1 is necessitated to maintain consistency in the notationwithin this paper.

Now, wave-domain signal representations are explained which are specificto WDAF. The tilde will be used to distinguish the wave-domainrepresentations from others in this paper. From the loudspeaker signalswe obtain the so-called free-field description {tilde over (x)}(n) usingtransform T₁:{tilde over (x)}(n)=T₁x(n).  (14a)The vector {tilde over (x)}(n) exhibits the same structure as x(n),replacing the segments x_(λ)(n) by {tilde over (x)}_(l)(n) and thecomponents x_(λ)(k) by {tilde over (x)}_(l)(k) being the time-domainsamples of the N_(L) individual wave field components with the wavefield component index l. From the microphone signals the so-calledmeasured wave field will be obtained in the same way using transform T₂:{tilde over (d)}(n)=T₂d(n).  (14b)Here, {tilde over (d)}(n) is structured like d(n) with the segmentsd_(μ)(n) replaced by {tilde over (d)}_(m)(n) and the components d_(μ)(k)replaced by d_(m)(k) denoting the time-domain samples of the N_(M)individual wave field components of the measured wave field, indexed bym. The frequency-independent unitary transforms T₁ and T₂ will bederived in Sec. III as described later. Replacing them with identitymatrices of the appropriate dimensions leads to the description of anMCAEC without a spatial transform as a special case of a WDAF AEC [15].This type of AEC will be referred to as conventional AEC in thefollowing.

In the wave domain, y(n) is obtained as an estimate for d(n) by using{tilde over (y)}(n)={tilde over (H)}(n){tilde over (x)}(n),  (14c)where {tilde over (y)}(n) is structured like d(n) and theL_(B)·N_(M)×L_(X)·N_(L) matrix {tilde over (H)}(n) is a wave-domainestimate for H so that the time-domain samples comprised by {tilde over(y)}(n) are given through

$\begin{matrix}{{{\overset{\sim}{y}}_{m}(k)} = {\sum\limits_{l = 1}^{N_{L}}{\sum\limits_{\kappa = 0}^{L_{H} - 1}{{{\overset{\sim}{x}}_{l}\left( {k - \kappa} \right)}{{{\overset{\sim}{h}}_{m,l}\left( {n,\kappa} \right)}.}}}}} & \left( {14d} \right)\end{matrix}$

Again, the vectors {tilde over (h)}_(m,l)(k) describe impulse responsesof length L_(H) which are (in contrast to h_(μ,λ)(k)) also dependent onthe block index n. This is necessitated since later, an iterative updateof those impulse responses will be described. Please note that {tildeover (h)}_(m,l)(n,k) and h_(μ,λ)(k) are assumed to have the same lengthfor the analysis conducted here. As a consequence, the effects of apossibly unmodeled impulse response tail [16] are not considered.Finally, the error in the wave domain can be defined by{tilde over (e)}(n)={tilde over (d)}(n)−{tilde over (y)}(n),  (15)which shares the structure with {tilde over (d)}(n), comprising thesegments {tilde over (e)}_(m)(n). These signals can be transformed backto error signals compatible to the microphone signals d(n) by usinge(n)=T₂ ⁻¹{tilde over (e)}(n).  (16)

An AEC aims for a minimization of the error e(n) with respect to asuitable norm. The most commonly used norm in this regard is theEuclidean norm ∥e(n)∥₂. This motivated the choice of a unitary matrix T₂leading to an equivalent error criterion in the wave domain and for thepoint observation signals, ∥e(n)∥₂=∥{tilde over (e)}(n)∥₂. The so-called“Echo Return Loss Enhancement” (ERLE) provides a measure for theachieved echo cancellation. During inactivity of the local acousticsources it can be defined by

$\begin{matrix}{{{ERLE}(n)} = {{10\;{\log_{10}\left( \frac{{{\overset{\sim}{d}(n)}}_{2}^{2}}{{{\overset{\sim}{e}(n)}}_{2}^{2}} \right)}} = {10\;{{\log_{10}\left( \frac{{{d(n)}}_{2}^{2}}{{{e(n)}}_{2}^{2}} \right)}.}}}} & (17)\end{matrix}$

Now the nonuniqueness problem for the MCAEC, which is already known fromthe stereophonic AEC will be shortly reviewed. After determining theconditions for the occurrence of the nonuniqueness problem, it will beexplained why the residual echo is not the only important measure for anAEC and that the mismatch of the identified impulse responses to thetrue impulse responses of the LEMS has to be considered as well.

At first, the conditions for the occurrence of the nonuniqueness problemare determined by considering the idealized case of an AEC where theresidual echo vanishes. By using (12a), (14a), (14b), and (15) the errormay be written as{tilde over (e)}(n)=(T₂H−{tilde over (H)}(n)T₁)×(n).  (18)

In the ideal case the LEMS can be perfectly modeled and local acousticsources are inactive. As a consequence, an optimal solution in the senseof minimizing any norm ∥{tilde over (e)}(n)∥ also achieves {tilde over(e)}(n)=0. Under these conditions, the nonuniqueness problem may bediscussed independently from the algorithm used for system description.

If {tilde over (e)}(n)=0 is necessitated for all possible x(n), theunique solution{tilde over (H)}(n)T₁=T₂H,  (19)is obtained, where {tilde over (H)}(n) fully identifies the roomdescribed by H in the vector space spanned by T₂. This will be referredto as the perfect solution in the following, which can be identified intheory given the observed vectors d(n) for a sufficiently large set oflinearly independent vectors x(n). However, according to (10a) x(n)originates from {circumflex over (x)}(n), so that the set of observablevectors x(n) is limited by G_(RS). Using (10a) and (18) we obtain{tilde over (e)}(n)=(T₂H−{tilde over (H)}(n)T₁)G_(RS){tilde over(x)}(n),  (20)so that necessitating {tilde over (e)}(n)=0 for all {circumflex over(x)}(n) does no longer guarantee a unique solution for {tilde over(H)}(n). In the following, conditions for nonunique solutions areinvestigated. Without loss of generality we may assume L_(B)=1 leadingto L_(X)=L_(H) for the remainder of this section, leaving no constraintson the structures of {tilde over (H)}(n) and H(n). Obviously, the matrixG_(RS) has a rank of min{N_(L)·L_(H), N_(S)·(L_(H)+L_(G)−1)} when beingfull-rank, as we will assume in the following. Whenever this rank isless than the column dimension of the term (T₂H−{tilde over (H)}(n)T₁),there are multiple solutions (T₂H−{tilde over (H)}(n)T₁)≠0 fulfilling{tilde over (e)}(n)=0, and the problem of identifying H isunderdetermined. So the solution is only unique ifN_(L)·L_(H)≤N_(S)·(L_(H)+L_(G)−1).  (21)

It can be seen that the relation of the number of used loudspeakers andactive signal sources is the most decisive property regarding thenonuniqueness problem. Whenever there are at least as many sourcesignals as loudspeakers, e.g., N_(S)≥N_(L) the nonuniqueness problemdoes not occur. On the other hand, a long impulse response of thereproduction system may also prevent occurring the nonuniquenessproblem. This result generalizes the results of Huang et al. [16] whoanalyzed the case L_(H)=L_(G), N_(S)=1 for a least squares minimizationof {tilde over (e)}(n). For reproduction systems like WFS anN_(L)>>N_(S) and a limited L_(G) are typical parameters, so thenonuniqueness problem is relevant in most practical situations.

Now, the consequences of the nonuniqueness problem are discussed. Sinceall solutions achieving {tilde over (e)}(n)=0 cancel the echo optimally,it is not immediately evident why obtaining a solution different fromthe perfect solution can be problematic. This changes, when regardingthe reproduction system G_(RS) as being time-variant in practice. As anexample, consider a WFS system synthesizing a plane wave with a suddenlychanging incidence angle, modeled by two different matrices G_(RS), onefor the first incidence angle and another for the second. When theproblem of finding {tilde over (H)}(n) is underdetermined, an adaptationalgorithm will converge to one of many solutions for each of bothG_(RS). Without further objectives than minimizing {tilde over (e)}(n),these solutions may be arbitrarily distinct to another. So a solutionfound for one G_(RS) is not optimal for another G_(RS) and aninstantaneous breakdown in ERLE at the time instant of change is theconsequence [5,11].

This breakdown in ERLE may become quite significant in practice. There,noise, interference, double-talk, an unsuitable choice of parameters, oran insufficient model will cause divergence. Consequently, theadaptation algorithm may be driven to virtually any of the possiblesolutions. As the solutions for {tilde over (H)}(n) given a specificG_(RS) do not form a bounded set whenever the nonuniqueness problemoccurs, a solution for one G_(RS) may be arbitrarily different to any ofthe solutions for another G_(RS). This makes the breakdown in ERLE infact uncontrollable and constitutes a major problem for the robustnessof an MCAEC.

If the perfect solution is obtained, there will be no breakdown in ERLEfor any change of G_(RS), as this solution is independent from G_(RS).This makes solutions in the vicinity of the perfect solution favorablein order to reduce the amount of ERLE loss following changes of G_(RS).The normalized misalignment is a metric to determine the distance of asolution from the perfect solution given in (19). For the systemdescribed here, this measure can be formulated as follows:

$\begin{matrix}{{{\Delta_{H}(n)} = {10{\log_{10}\left( \frac{{{{T_{2}H} - {{\overset{\sim}{H}(n)}T_{1}}}}_{F}^{2}}{{{T_{2}H}}_{F}^{2}} \right)}}},} & (22)\end{matrix}$where ∥·∥_(F) stands for the Frobenius norm. The smaller the normalizedmisalignment, the smaller is the expected breakdown in ERLE when G_(RS)changes. Still, the minimization of the error signal is the mostimportant criterion regarding the perceived echo but, in order toincrease the robustness of an AEC, minimization of normalizedmisalignment remains the ultimate goal. Since one cannot observe H, adirect minimization of the normalized misalignment is not possible.Hence, a method to heuristically minimize this distance is presented inthis work.

By considering (20) we may calculate the number of singular values of{tilde over (H)}(n) that can be uniquely determined necessitating {tildeover (e)}(n)=0 for a given number of sources N_(S). Assuming allsingular values of {tilde over (H)}(n) to have an equal influence onΔ_(H)(n) and all non-unique values to be zero, a coarse approximation ofthe lower bound for the normalized misalignment can be obtained. From(20) and (22) we obtain

$\begin{matrix}{{\min\left\{ {\Delta_{H}(n)} \right\}} \approx {10\;{\log_{10}\left( {1 - \frac{N_{S}\left( {L_{H} + L_{G} - 1} \right)}{N_{L}L_{H}}} \right)}}} & (23)\end{matrix}$given that the observed signals provide the only available informationabout the LEMS.

In the following, the wave-domain signal and system representations areprovided. An explicit definition of the necessitated transforms is givenand the exploited wave-domain properties of the LEMS are described.

At first, the wave-domain signal representations as key concepts of WDAFare presented. First the transforms to the wave domain will beintroduced, so that we the properties of the LEMS in the wave domain canthen be discussed. For the derivation of the transforms, we afundamental solution of the wave equation will be used. Since thissolution is given in the continuous frequency domain, compatibility tothe discrete-time and discrete-frequency signal representations asdescribed above should be achieved.

At first, the transforms of the point observation signals to the wavedomain are derived. There are a variety of fundamental solutions of thewave equation available for the wave-domain signal representations. Someexamples are plane waves [13], spherical harmonics, or cylindricalharmonics [18]. A choice can be made by considering the array setup,which is a concentric planar setup of two uniform circular arrays withinthis work, as it is depicted in FIG. 2. For this setup, the positions ofthe N_(L) loudspeakers may be described in polar coordinates by a circlewith radius R_(L) and the angles determined by the loudspeaker index λ:

$\begin{matrix}{{{\overset{\rightarrow}{l}}_{\lambda} = \left( {{\lambda \cdot \frac{2\;\pi}{N_{L}}},R_{L}} \right)^{T}},{\lambda = 0},\ldots\mspace{14mu},{N_{L} - 1.}} & (24)\end{matrix}$

In the same way the positions of the N_(M) microphones positioned on acircle with radius R_(M) are given by

$\begin{matrix}{{{\overset{\rightarrow}{m}}_{\mu} = \left( {{\mu \cdot \frac{2\;\pi}{N_{M}}},R_{M}} \right)^{T}},{\mu = 0},\ldots\mspace{14mu},{N_{M} - 1},} & (25)\end{matrix}$with the microphone index μ. Limiting the considerations to twodimensions, the sound pressure may be described in the vicinity of themicrophone array using so-called circular harmonics [18]

$\begin{matrix}{{{P\left( {\alpha,\varrho,{j\;\omega}} \right)} = {\sum\limits_{m^{\prime} = {- \infty}}^{\infty}{\left( {{{{\overset{\sim}{P}}_{m^{\prime}}^{(1)}\left( {j\;\omega} \right)}{\mathcal{H}_{m^{\prime}}^{(1)}\left( {\frac{\omega}{c}\varrho} \right)}} + {{{\overset{\sim}{P}}_{m^{\prime}}^{(2)}\left( {j\;\omega} \right)}{\mathcal{H}_{m^{\prime}}^{(2)}\left( {\frac{\omega}{c}\varrho} \right)}}} \right)e^{j\; m^{\prime}\alpha}}}},} & (26)\end{matrix}$where H_(m′) ⁽¹⁾(x) and H_(m′) ⁽²⁾(x) are Hankel functions of the firstand second kind and order m, respectively, ω=2πf denotes the angularfrequency, c is the speed of sound, j is used as the imaginary unit, and

and α describe a point in polar coordinates as shown in FIG. 2. We willrefer to the wave field components indexed by m′ in (26) et sqq. asmodes. The quantities {tilde over (P)}_(m′) ⁽¹⁾(jω) and {tilde over(P)}_(m′) ⁽²⁾(jω) may be interpreted as spectra of an incoming and anoutgoing wave (relative to the origin). Assuming the absence of acousticsources within the microphone array, {tilde over (P)}_(m′) ⁽²⁾(jω) isdetermined by {tilde over (P)}_(m′) ⁽¹⁾(jω) and the scatterer within themicrophone array. Consequently, we may limit our considerations to{tilde over (P)}_(m′) ^((s))(jω) describing the superposition of {tildeover (P)}_(m′) ⁽¹⁾(jω) and {tilde over (P)}_(m′) ⁽²⁾(jω):

$\begin{matrix}{{{{{\overset{\sim}{P}}_{m^{\prime}}^{(s)}\left( {j\;\omega} \right)}{B_{m^{\prime}}\left( {\frac{\omega}{c}\varrho} \right)}} = {{{{\overset{\sim}{P}}_{m^{\prime}}^{(1)}\left( {j\;\omega} \right)}{\mathcal{H}_{m^{\prime}}^{(1)}\left( {\frac{\omega}{c}\varrho} \right)}} + {{P_{m^{\prime}}^{(2)}\left( {j\;\omega} \right)}{\mathcal{H}_{m^{\prime}}^{(2)}\left( {\frac{\omega}{c}\varrho} \right)}}}},} & (27)\end{matrix}$where B_(m′)(x) is dependent on the scatterer within the microphonearray. If no scatterer is present, B_(m′)(x) is equal to the ordinaryBessel function of the first kind J_(m)(x) of order m′. The solution fora cylindrical baffle can be found in [19].

Now, transform T₂ is explained in more detail. The transform T₂ is usedto obtain a wave-domain description of the sound pressure measured bythe microphones. Using (26) and (27) we obtain {tilde over (P)}_(m′)^((s))(jω) as a Fourier series coefficient according to

$\begin{matrix}{{{B_{m^{\prime}}\left( {\frac{\omega}{c}R_{M}} \right)}{{\overset{\sim}{P}}_{m^{\prime}}^{(s)}\left( {j\;\omega} \right)}} = {\frac{1}{2\;\pi}{\int_{0}^{2\;\pi}{{P\left( {\alpha,R_{M},{j\;\omega}} \right)}e^{\;{j\; m^{\prime}\alpha}}d\;{\alpha.}}}}} & (28)\end{matrix}$

In contrast to Ref. 13, where sound velocity and sound pressure wereused, we only need to consider the sound pressure on a circle for (28)as both, {tilde over (P)}_(m′) ⁽¹⁾(jω) and {tilde over (P)}_(m′)⁽²⁾(jω), are replaced by {tilde over (P)}_(m′) ^((s))(jω). However, wecan only sample the wave field at the N_(M) discrete points described by{right arrow over (m)}_(μ), so that we approximate the integral in (28)by a sum and obtain

$\begin{matrix}{{{{B_{m^{\prime}}\left( {\frac{\omega}{c}R_{M}} \right)}{{\overset{\sim}{P}}_{m^{\prime}}^{(s)}\left( {j\;\omega} \right)}} \approx {\frac{1}{N_{M}}{\sum\limits_{\mu = 0}^{N_{M} - 1}{{{\hat{P}}_{\mu}^{(d)}\left( {j\;\omega} \right)}e^{{- j}\; m^{\prime}\mu\;\frac{2\pi}{N_{M}}}}}}},} & (29)\end{matrix}$where {tilde over (P)}_(μ) ^((d))(jω) denotes the spectrum of the soundpressure measured by microphone μ. The superscript (d) refers to d(n) inSec. II as described later. We will use the right-hand side of (29) asthe signal representation of the microphone signals in the wave domainand obtain

$\begin{matrix}{{{{\overset{\sim}{P}}_{m^{\prime}}^{(d)}\left( {j\;\omega} \right)}:={\frac{1}{N_{M}}{\sum\limits_{\mu = 0}^{N_{M} - 1}{{{\hat{P}}_{\mu}^{(d)}\left( {j\;\omega} \right)}e^{{- j}\; m^{\prime}\mu\;\frac{2\pi}{N_{M}}}}}}},} & (30)\end{matrix}$which is referred as the measured wave field. The aliasing due to thespatial sampling as well as the term

$B_{m^{\prime}}\left( {\frac{\omega}{c}R_{M}} \right)$is neglected in (30) as it will later be modeled by the wave-domainLEMS. Considering (30) as T₂, T₂ is equivalent to the spatial DFT andtherefore unitary up to a scaling factor. Due to the spatial sampling,the sequence of modes {tilde over (P)}_(m′) ^((d))(jω) is periodic in m′with a period of N_(M) orders, so that we can restrict our view to themodes m′=−N_(M)/2+1, . . . , N_(M)/2 without loss of generality.

Now, transform T₁ is presented in more detail. The transform T₁ asderived in this section, is used to obtain a wave-domain description ofthe sound field at the position of the microphone array as it would becreated by the loudspeakers under free-field conditions. One possibilityto define T₁ is to simulate the free-field point-to-point propagationbetween loudspeakers and microphones and then transform the obtainedsignal according to T₂, as it was proposed in Ref. 13. This approach hasthe advantage to implicitly model the aliasing by the microphone array,but it has also some disadvantages: The number of resulting wave fieldcomponents is limited by the number of microphones and not by the(typically higher) number of loudspeakers and the resulting transform isfrequency dependent. As we aim at frequency-independent invertibletransforms, we follow an alternative approach, where we determine thefree-field wave field components excited by the loudspeakers at themicrophone array circumference independently from the actual number ofmicrophones. Unfortunately, determining the desired free-field soundpressure with the three-dimensional Green's function does not lead to aresult that can be straightforwardly transformed using (28). So, wedescribe the sound pressure at the position of the microphones byapproximating the wave propagation from the loudspeakers to themicrophones in two stages: a three-dimensional wave propagation from theloudspeakers to the origin and a two-dimensional wave propagation alongthe microphone array located at the origin. As the Green's functionsfrom the loudspeakers to the origin are not dependent on the microphonepositions, the integral in (28) has only to be evaluated for thetwo-dimensional propagation along the microphone array, which isconveniently solvable.

The three-dimensional wave propagation from the individual loudspeakerpositions to the center of the microphone array, e.g., the origin of thecoordinate system, is described by the free-field Green's function [20]

$\begin{matrix}{{G\left( \overset{\rightarrow}{0} \middle| {\overset{\rightarrow}{l}}_{\lambda} \right)} = {\frac{e^{{- j}\; R_{L}\frac{\omega}{c}}}{R_{L}}.}} & (31)\end{matrix}$

For the two-dimensional wave-propagation along the microphone array theloudspeaker contributions are regarded as plane waves, which is valid if[21]

$\begin{matrix}{{R_{L} > \frac{8\; R_{M}^{2}\omega}{2\;\pi\; c}},{R_{M} ⪡ {R_{L}.}}} & (32)\end{matrix}$

The propagation of a loudspeaker contribution along the microphone arrayis approximated as a plane wave propagation with the incidence angle φand described byG_(PW)({right arrow over (x)},φ,jω)=e^(−j)

^(cos(α−φ)ω/c).  (33)

Using

${\varphi = {\lambda \cdot \frac{2\;\pi}{N_{L}}}},$the sound pressure P(α,R_(M), jω) in the vicinity of the microphonearray may be approximated by a superposition of plane waves

$\begin{matrix}{{P\left( {\alpha,R_{M},{j\;\omega}} \right)} \approx {\sum\limits_{\lambda = 0}^{N_{L} - 1}{{{\hat{P}}_{\lambda}^{(x)}\left( {j\;\omega} \right)} \cdot {G\left( {\left. \overset{\rightarrow}{0} \middle| {\overset{\rightarrow}{l}}_{\lambda} \right.,{j\;\omega}} \right)} \cdot {G_{PW}\left( {\overset{\rightarrow}{x},{\lambda\;\frac{2\;\pi}{N_{L}}},{j\;\omega}} \right)}}}} & (34) \\{\mspace{146mu}{{\approx {\sum\limits_{\lambda = 0}^{N_{L} - 1}{{{\hat{P}}_{\lambda}^{(x)}\left( {j\;\omega} \right)}\frac{e^{{j{({{R_{M}c\; o\;{s{({\alpha - {\lambda\;\frac{2\;\pi}{N_{L}}}})}}} - R_{L}})}}\frac{\omega}{c}}}{R_{L}}}}},}} & (35)\end{matrix}$where {circumflex over (P)}_(λ) ^((x))(jω) is the spectrum of the soundfield emitted by loudspeaker λ and {right arrow over (x)}=(α,R_(M))^(T). Again, the superscript (x) referring to x(n), as explainedabove, is used.

As we derive transform T₁ using the free-field assumption,B_(m′)(x)=J_(m′)(x) holds for this derivation. We insert (35) into (28),replace the index m′ by l′ and use the Jacobi-Anger expansion [22] toderive

${{\int_{0}^{2\;\pi}{e^{j\; R_{M}c\; o\;{s{({\alpha - {\lambda\;\frac{2\pi}{N_{L}}}})}}\frac{\omega}{c}}e^{{- j}\; l^{\prime}\alpha}d\;\alpha}} = {\sum\limits_{v = {- \infty}}^{\infty}{j^{v}{\mathcal{J}_{v}\left( {R_{M}\frac{\omega}{c}} \right)}e^{{- j}\; v\;\lambda\;\frac{2\;\pi}{N_{L}}}{\int_{0}^{2\;\pi}{e^{{j{({v - l^{\prime}})}}\alpha}d\;\alpha}}}}},$which is used to transform (35) to the wave domain:

$\begin{matrix}{{{\overset{\sim}{P}}_{l^{\prime}}\left( {j\;\omega} \right)} = {j^{l^{\prime}}{\sum\limits_{\lambda = 0}^{N_{L} - 1}{{{\overset{\sim}{P}}_{\lambda}^{(x)}\left( {j\;\omega} \right)}{\frac{e^{- {j{({{R_{L}\frac{\omega}{c}} + {l^{\prime}\lambda\frac{2\pi}{N_{L}}}})}}}}{R_{L}}.}}}}} & (36)\end{matrix}$

The resulting P_(l′)(jω) represents P(α,R_(M), jω) in the wave-domain.According to (31), the wave propagation from the loudspeaker positionsto the origin is identical for all loudspeakers, so we may leave it tobe incorporated into the LEMS model. The same holds for the term j^(l′),so that the spatial DFT for T₁ can be used:

$\begin{matrix}{{{{\overset{\sim}{P}}_{l^{\prime}}^{(x)}\left( {j\;\omega} \right)}:={\sum\limits_{\lambda = 0}^{N_{L} - 1}{{{\hat{P}}_{\lambda}^{(x)}\left( {j\;\omega} \right)}e^{{- j}\; l^{\prime}\lambda\frac{2\;\pi}{N_{L}}}}}},} & (37)\end{matrix}$where {tilde over (P)}_(l′) ^((x))(jω) is now the free-field descriptionof the loudspeaker signals and l′ denotes the mode order. Again, welimit our view to N_(L) non-redundant components l′=−(N_(L/2)−1), . . ., N_(L)/2 without loss of generality. When obtaining (30) from (29) and(37) from (36), we left the scattering at the microphone array, thedelay and the attenuation to be described by the wave-domain LEMS model.For an AEC this is possible because a physical interpretation of theresult of the system description is not needed. However, this assumptionmay change the properties of the LEMS modeled in the wave domain.Fortunately, for the considered array setup, the properties describedlater remain unchanged.

Now, the LEM System Model in the wave domain is explained. Theattractive properties motivating the adaptive filtering in the wavedomain are discussed in the following and are compared to the propertiesof the LEM model when considering the point observation signals. Wemodel the LEMS, e.g., the coupling between the sound p(x) pressureemitted by the loudspeaker {tilde over (P)}_(λ) ^((x))(jω) and the soundpressure measured by the microphones {tilde over (P)}_(μ) ^((d))(jω)

$\begin{matrix}{{{{\hat{P}}_{\mu}^{(d)}\left( {j\;\omega} \right)} = {\sum\limits_{\lambda = 0}^{N_{L} - 1}{{{\hat{P}}_{\lambda}^{(x)}\left( {j\;\omega} \right)}{H_{\mu,\lambda}\left( {j\;\omega} \right)}}}},{\mu = 0},1,\ldots\mspace{14mu},{N_{M} - 1},} & (38)\end{matrix}$where H_(μ,λ)(jω) is equal to the Green's function between therespective loudspeaker and the microphone position fulfilling theboundary conditions determined by the enclosing room. Using (30) and(37), it is possible to describe (38) in the wave domain:

$\begin{matrix}{{{{\overset{\sim}{P}}_{m^{\prime}}^{(d)}\left( {j\;\omega} \right)} = {\sum\limits_{l^{\prime} = {{N_{L}/2} + 1}}^{N_{L}/2}{{{\overset{\sim}{H}}_{m^{\prime},l^{\prime}}\left( {j\;\omega} \right)}{{\overset{\sim}{P}}_{l^{\prime}}^{(x)}\left( {j\;\omega} \right)}}}},} & (39)\end{matrix}$where H_(m′,l′)(jω) describes the coupling of mode l′ in the free-fielddescription and mode m′ in the measured wave field. In the free field wewould observe {tilde over (H)}_(m′,l′)(jω)≠0 only for m′=l′, but in areal room other couplings are expected.

While a conventional AEC aims to identify H_(μ,λ)(jω) directly, a WDAFAEC aims to identify {tilde over (H)}_(m′,l′)(jω) instead. Wheneveridentifying H_(μ,λ)(jω) does not lead to a unique solution, the same isthe case for {tilde over (H)}_(m′,l′)(jω) regardless of the usedtransforms. However, while H_(μ,λ)(jω) and {tilde over (H)}_(m′,l′)(jω)are equally powerful in their ability to model the LEMS, theirproperties differ significantly. For illustration, a sample for {tildeover (H)}_(μ,λ)(jω) was obtained by measuring the frequency responsesbetween loudspeakers and microphones located in a real room (T₆₀≈0.25 s)using the array setup depicted in FIG. 2 with R_(L)=1.5 m, R_(M)=0.05 m,N_(L)=48, N_(M)=10. From H_(μ,λ)(jω), H_(μ,λ)(jω) was calculated byusing (30) and (37). The result is shown in FIG. 4, where it can beclearly seen that the couplings of different loudspeakers andmicrophones are similarly strong, while there are stronger couplings formodes with a small order difference |m′−l′| in their order. This can beexplained by the fact that the wave field as excited by the loudspeakersin the free-field case is also the most dominant contribution to thewave field in a real room. This property may be observed for differentLEMSs and was already used by the authors for a reduced complexitymodeling of the LEMS [23]. It is proposed to exploit this property toimprove the system description. As {tilde over (H)}_(m′,l′)(jω) has areliably predictable structure, we may aim at a solution for the systemdescription where the couplings of modes with a small difference |m′−l′|are stronger than others and reduce the mismatch in a heuristic sense.An adaptation algorithm approaching such a solution is presented lateron.

Now, temporal Discretization and Approximation of the LEM System Modelis explained. Compatibility between the continuous frequency-domainrepresentations used above with the discrete quantities will beestablished. The quantities {circumflex over (P)}_(λ) ^((x))(jω) and{circumflex over (P)}_(μ) ^((d))(jω) may be related to x_(λ)(k) andd_(μ)(k) by a transform to the time domain and appropriate sampling withthe sampling frequency f_(x).

The mode order l′ and m′ in {tilde over (P)}_(l′) ^((x))(jω) and {tildeover (P)}_(m′) ^((d))(jω) may be mapped to the indices of the wave fieldcomponents {tilde over (x)}_(l)(n) and {tilde over (d)}_(m)(n) through

$\begin{matrix}{l^{\prime} = \left\{ \begin{matrix}l & {{{{for}\mspace{14mu} l} \leq {N_{L}/2}},} \\{l - N_{L}} & {elsewhere}\end{matrix} \right.} & (40) \\{and} & \; \\{m^{\prime} = \left\{ \begin{matrix}m & {{{{for}\mspace{14mu} m} \leq {N_{M}/2}},} \\{m - N_{M}} & {{elsewhere}.}\end{matrix} \right.} & (41)\end{matrix}$

As the transforms T₂ and T₁ are frequency-independent, they may bedirectly applied to the loudspeaker and microphone signals resulting inthe matrices T₂ and T₁ being equal to scaled DFT matrices with respectto the indices μ and λ:

$\begin{matrix}{{\left\lbrack T_{2} \right\rbrack_{p,q} = {\frac{d\left( {p,q,L_{D}} \right)}{\sqrt{N_{M}}}e^{{- j}{\lfloor{{({p - 1})}/L_{D}}\rfloor}{\lfloor{{({q - 1})}/L_{D}}\rfloor}\frac{2\;\pi}{N_{M}}}}},} & (42) \\{{\left\lbrack T_{1} \right\rbrack_{p,q} = {\frac{d\left( {p,q,L_{X}} \right)}{\sqrt{N_{L}}}e^{{- j}{\lfloor{{({p - 1})}/L_{X}}\rfloor}{\lfloor{{({q - 1})}/L_{X}}\rfloor}\frac{2\;\pi}{N_{L}}}}},} & (43)\end{matrix}$where [M]_(p,q) indexes an entry in M located in row p and column q and

$\begin{matrix}{{d\left( {p,q,L} \right)} = \left\{ {\begin{matrix}1 & {{{if}\mspace{14mu}{{mod}\left( {{p - q},L} \right)}} = 0} \\0 & {elsewhere}\end{matrix}.} \right.} & (44)\end{matrix}$

The obtained discrete-time signal representations implicitly definediscrete-time system representations. Here, h_(μ,λ)(k) and {tilde over(h)}_(m′,l′)(k) are the discrete-time representations of H_(μ,λ)(jω) and{tilde over (H)}_(m′,l′)(jω) respectively.

In the following, embodiments which employ adaptive filtering areprovided. The proposed approach is realized by a modified version of thegeneralized frequency domain filtering (GFDAF) algorithm like it isdescribed in [14]. At first, this algorithm will shortly be reviewed andthen, and then, the modified version will be provided.

At first, GFDAF is explained in more detail. In [14] an efficientadaptation algorithm for the MCAEC was presented. This algorithm showsRLS-like properties and was also used as the basis for the derivation ofthe algorithm in [15]. For sake of clarity, this algorithm will bedescribed operating on the signals {tilde over (e)}_(m)(n) separatelyfor each wave field component indexed by m, as separate and jointminimization of ∥{tilde over (e)}_(m)(n)∥₂ ²∀m coincide [14]. It shouldbe noted that we do not consider the modeled impulse responses to bepartitioned as it was done in [14]since this is not necessitated todescribe the proposed approach.

For the signals {tilde over (x)}_(l)(n), {tilde over (e)}_(m)(n), and{tilde over (d)}_(m)(n) at first the DFT-domain representations aredefined by{tilde over (x)} _(l)(n)=F_(2L) _(B) {tilde over (x)}_(l)(n),  (45){tilde over (e)} _(m)(n)=F_(L) _(B) {tilde over (e)}_(m)(n),  (46){tilde over (d)} _(m)(n)=F_(L) _(B) {tilde over (d)}_(m)(n),  (47)where F_(L) is the L×L DFT matrix. It may further be necessitated thatL_(X)=2L_(H) and L_(B)=L_(H). From the signal vector x(n) all wave fieldcomponents l=0, 1, . . . , N_(L)−1 may be considered for theminimization of ∥{tilde over (e)}_(m)(n)∥₂ for every m respectively.X(n)=(diag{{tilde over (x)} ₀(n)},diag{{tilde over (x)} ₁(n)}, . . .,diag{{tilde over (x)} _(N) _(L) ⁻¹(n)}).  (48)

For each component m, the error {tilde over (e)}_(m)(n) is obtained,using the discrete representation {tilde over (h)} _(m)(n) of {tildeover (h)}_(m,l)(n,k) for this particular m and all l:{tilde over (e)} _(m)(n)={tilde over (d)} _(m)(n)−W ₀₁ X(n)W ₁₀ {tildeover (h)} _(m)(n−1),  (49)where we use the matrices W ₀₁ and W ₁₀ for the time-domain windowing ofthe signals:W ₀₁=F_(L) _(B) (0_(L) _(B) _(×L) _(B) ,I_(L) _(B) _(×L) _(B) )F_(2L)_(B) ⁻¹,  (50)W ₁₀=bdiag^(N) ^(L) {F_(2L) _(B) (I_(L) _(B) _(×L) _(B) ,0_(L) _(B)_(×L) _(B) )^(T)F_(L) _(B) ⁻¹},  (51)with the block-diagonal operator bdiag^(N) {M}forming a block-diagonalmatrix with the matrix M repeated N times on its diagonal.

A matrix {tilde over (H)}(n) may be defined by the N_(M) vectors {tildeover (h)} ₀(n), . . . , {tilde over (h)} _(m)(n), . . . , {tilde over(h)} _(N) _(M) ⁻¹(n) which may form the columns of the matrix {tildeover (H)}(n). Thus, the matrix {tilde over (H)}(n) can be considered asa loudspeaker-enclosure-microphone system description of theloudspeaker-enclosure-microphone system description. Moreover, apseudo-inverse matrix H ⁻¹(n) of {tilde over (H)}(n) or the conjugatetranspose matrix H ^(T)(n) of {tilde over (H)}(n) may also be consideredas a loudspeaker-enclosure-microphone system description of the LEMS.

The vector {tilde over (h)} _(m)(n) can be subdivided into N_(L) parts{tilde over (h)} _(m)(n)=({tilde over (h)} _(m,l)(n), {tilde over (h)}_(m,2)(n), . . . , {tilde over (h)} _(m,N) _(L) (n))^(T), where eachvector {tilde over (h)} _(m,l)(n) contains the DFT-domain representationof {tilde over (h)}_(m,l)(n,k).

Thus, the matrix {tilde over (H)}(n) may be considered to comprise aplurality of matrix coefficients h_(0,1)(n,k), h_(m,2)(n,k), . . . ,h_(m,N) _(L) (n,k)

The minimization of the cost function

$\begin{matrix}{{{J_{m}(n)} = {\left( {1 - \lambda_{a}} \right){\sum\limits_{i = 0}^{n}{\lambda_{a}^{n - i}{{\underset{\_}{\overset{\sim}{e}}}_{m}^{H}(i)}{{\underset{\_}{\overset{\sim}{e}}}_{m}(i)}}}}},} & (52)\end{matrix}$with ·^(H) being the conjugate transpose leads to the followingadaptation algorithm [14]{tilde over (h)} _(m)(n)={tilde over (h)} _(m)(n−1)+(1−λ_(a))S ⁻¹(n)W ₁₀^(H) X ^(H)(n)W ₀₁ ^(H) {tilde over (e)} _(m)(n)  (53)withS(n)=λ_(a) S(n−1)+(1−λ_(a))W ₁₀ ^(H) X ^(H)(n)W ₀₁ ^(H) W ₀₁ X(n)W₁₀.  (54)

The described algorithm can be approximated such that S(n) is replacedby a sparse matrix which allows a frequency bin-wise inversion leadingto a lower computational complexity [14].

For the scenarios considered here, the nonuniqueness problem willusually occur and there are multiple solutions for {tilde over (h)}_(m)(n) which minimize (52). Consequently, the matrix S(n) is singularand has to be regularized for invertibility. In [14], a regularizationwas proposed which maintains robustness of the algorithm in the case ofinsufficient power or inactivity of the individual loudspeaker signals.However, in the scenarios considered here, all wave field components aresufficiently exited and this regularization is not effective here.Instead, we propose a different regularization by defining the diagonalmatrixD(n)=βDiag{σ₀ ²(n),σ₁ ²(n), . . . ,σ_(L) _(H) _(N) _(L) ⁻¹ ²(n)}  (55)where β is a scale parameter for the regularization. The individualdiagonal elements σ_(q) ²(n) are determined such that they are equal tothe arithmetic mean of all diagonal entries s_(p) ²(n) of S(n)corresponding to the same frequency bin as σ_(q) ²(n):

$\begin{matrix}{{{\sigma_{q}^{2}(n)} = {\frac{1}{N_{L}}{\sum\limits_{l = 0}^{N_{L} - 1}{s_{p}^{2}(n)}}}},{p = {{{mod}\left( {q,L_{H}} \right)} + {L_{H}l}}},} & (56)\end{matrix}$where p and q index the diagonal entries starting with zero. The matrixS(n) in (53) is then replaced by (S(n)+D(n)).

In the following, the modified GFDAF according to embodiments isdescribed. Modifications of the GFDAF according to embodiments arepresented. These modifications exploit the diagonal dominance of {tildeover (H)}_(m,l)(jω) discussed above. For the derivation, the costfunction given in (52) is modified as follows

$\begin{matrix}{{{J_{m}^{mod}(n)} = {{{{\underset{\_}{\overset{\sim}{h}}}_{m}(n)}^{H}{{\underset{\_}{C}}_{m}(n)}{{\underset{\_}{\overset{\sim}{h}}}_{m}(n)}} + {\left( {1 - \lambda_{a}} \right){\sum\limits_{i = 0}^{n}{\lambda_{a}^{n - i}{{\underset{\_}{\overset{\sim}{e}}}_{m}^{H}(i)}{{\underset{\_}{\overset{\sim}{e}}}_{m}(i)}}}}}},} & (57)\end{matrix}$where the matrix C _(m)(n) is chosen so that components in {tilde over(h)} _(m)(n) corresponding to non-dominant entries in {tilde over(H)}(j,ω) are more penalized than the others. By a derivation and byusing S(n)+C(n−1)≈S(n)+C _(m)(n), the following adaptation rule isobtained for a minimization of this cost function{tilde over (h)} _(m)={tilde over (h)} _(m)(n−1)+(1−λ_(a))(S(n)+C_(m)(n))⁻¹·(W ₁₀ ^(H) X ^(H)(n)W ₀₁ ^(H) {tilde over (e)} _(m)(n)−C_(m)(n){tilde over (h)} _(m)(n−1)  (58)

As for the original GFDAF, it is possible to formulate an approximationof this algorithm allowing a frequency bin-wise inversion of (S(n)+C_(m)(n)). The matrix C _(m)(n) is defined byC _(m)(n)=β₀ω_(c)(n)Diag{c₀(n),c₁(n), . . . ,c_(N) _(L) _(L) _(H)⁻¹(n)}  (59)with the scale parameter β₀,

$\begin{matrix}{{c_{q}(n)} = \left\{ \begin{matrix}\beta_{1} & {{{{when}\mspace{14mu}\Delta\;{m(q)}} = 0},} \\\beta_{2} & {{{{when}\mspace{14mu}\Delta\;{m(q)}} = 1},} \\1 & {{elsewhere},}\end{matrix} \right.} & (60)\end{matrix}$and the weighting function ω_(c)(n) explained later, whereΔm(q)=min(|└q/L_(H)┘−m,|└q/L_(H)┘−m−N_(L)|)  (61)is the difference of the mode orders |m′−l′| for the couplings describedby {tilde over (h)} _(m)(n).

Thus, each c_(q)(n) forms a coupling value for a mode-order pair of aloudspeaker-signal-transformation mode order (q/L_(H)) of the pluralityof loudspeaker-signal-transformation mode orders and a firstmicrophone-signal-transformation mode order (m) of the plurality ofmicrophone-signal-transformation mode orders.

The coupling value c_(q)(n) has a first value β₁, when the differencebetween the first loudspeaker-signal-transformation mode order l(l=└q/L_(H)┘) and the first microphone-signal-transformation mode orderm has a first difference value (Δm(q)=0).

The coupling value c_(q)(n) has a second value β₂ different from thefirst value β₁, when the difference between the firstloudspeaker-signal-transformation mode order (l=└q/L_(H)┘) and the firstmicrophone-signal-transformation mode order m has a different seconddifference value (Δm(q)=1).

In order to exploit the property of stronger weighted mode couplings fora small |m−l|, the parameters β¹ and β₂ may be chosen inversely to theexpected weights for the individual {tilde over (h)} _(m,l)(n), leadingto 0≤β₁≤β₂≤1. This choice guides the adaptation algorithm towardsidentifying a LEMS with mode couplings weighted as shown in FIG. 4. Thestrength of this non-restrictive constraint may be controlled by thechoice of 0≤β₀. However, given C _(m)(n)≠0 a minimization of (57) doesnot lead to a minimization of (52), which is still the main objective ofan AEC. Therefore we introduced the weighting function

$\begin{matrix}{{w_{c}(n)} = \frac{\sum\limits_{m = 0}^{N_{M} - 1}{J_{m}\left( {n - 1} \right)}}{\max\left\{ {{\sum\limits_{m = 0}^{N_{M} - 1}{{{\underset{\_}{\overset{\sim}{h}}}_{m}^{H}\left( {n - 1} \right)}{{\underset{\_}{\overset{\sim}{h}}}_{m}\left( {n - 1} \right)}}},1} \right\}}} & (62)\end{matrix}$to ensure an approximate balance of both terms in (57), so that thecosts introduced by C _(m)(n) do not hamper the steady stateminimization of (52).

The plurality of vectors {tilde over (h)} ₀(n), . . . , {tilde over (h)}_(m)(n), . . . , {tilde over (h)} _(N) _(M) ⁻¹(n) may be considered as aloudspeaker-enclosure-microphone system description of theloudspeaker-enclosure-microphone system description.

As has been explained above, an adaptation rule for adapting a LEMSdescription according to an embodiment, e.g. the adaptation ruleprovided in formula (58) can be derived from a modified cost function,e.g. from the modified cost function of formula (57). For this purpose,the gradient of the modified cost function may be set to zero and theadapted LEMS description is determined such that:

$\begin{matrix}{{\frac{\partial}{\partial{\underset{\_}{\overset{\sim}{h}}}_{m}^{H}}{J_{m}^{{mod}\; 2}(n)}}\overset{!}{=}0} & (63)\end{matrix}$

The procedure is to consider the complex gradient of the modified costfunction and determine filter coefficients so that this gradient iszero. Consequently, the filter coefficients minimize the modified costfunction.

This will now be explained in detail with reference to the modified costfunction of formula (57) and the adaptation rule of formula (58) as anexample. For this purpose, the complete derivation from (57) to (58) isprovided, which is similar to the derivation of the GFDAF in [14]. Asalready stated above, the procedure followed here is to consider thecomplex gradient of (57) and determine filter coefficients so that thisgradient is zero. Consequently, the filter coefficients minimize thecost function (57).

It should be noted that we exchanged λ_(a) for λ in order to increasethe readability of the document. The remaining notation is identical toformulae (57) and (58) and all undefined quantities refer to those usedthere. Starting with formula (57) as

$\begin{matrix}{{{J_{m}^{mod}(n)} = {{{{\underset{\_}{\overset{\sim}{h}}}_{m}^{H}(n)}{{\underset{\_}{C}}_{m}(n)}{{\overset{\sim}{\underset{\_}{h}}}_{m}(n)}} + {\left( {1 - \lambda} \right){\sum\limits_{i = 0}^{n}{\lambda^{n - i}{{\underset{\_}{\overset{\sim}{e}}}_{m}^{H}(i)}{{\underset{\_}{\overset{\sim}{e}}}_{m}(i)}}}}}},} & (64)\end{matrix}$the error {tilde over (e)} _(m)(n) is replaced by the error ê _(m)(n) ifthe filter coefficients ĥ _(m) would be used (which have to bedetermined) for all previous input signals. So a slightly modified costfunction

$\begin{matrix}{{J_{m}^{{mod}\; 2}(n)} = {{{\underset{\_}{\overset{\sim}{h}}}_{m}^{H}{{\underset{\_}{C}}_{m}(n)}{\overset{\sim}{\underset{\_}{h}}}_{m}} + {\left( {1 - \lambda} \right){\sum\limits_{i = 0}^{n}{\lambda^{n - i}{{\underset{\_}{\overset{\sim}{e}}}_{m}^{H}(i)}{{\underset{\_}{\overset{\sim}{e}}}_{m}(i)}}}}}} & (65)\end{matrix}$is obtained with{tilde over (e)} _(m)(n)={tilde over (d)} _(m)(n)−W ₀₁ X(n)W ₁₀ {tildeover (h)} _(m),  (66)in contrast to formula (49) which is{tilde over (e)} _(m)(n)={tilde over (d)} _(m)(n)−W ₀₁ X(n)W ₁₀ {tildeover (h)} _(m)(n−1).  (67)

This distinction is recommended to avoid ambiguities regarding the notperfectly consistent notation in [14]. Inserting (38) into (37), weobtain

$\begin{matrix}\begin{matrix}{{J_{m}^{{mod}\; 2}(n)} = {{{\underset{\_}{\overset{\sim}{h}}}_{m}^{H}{\underset{\_}{C}}_{m}{\underset{\_}{\overset{\sim}{h}}}_{m}} +}} \\{\left( {1 - \lambda} \right){\sum\limits_{i = 0}^{n}{{\lambda^{n - i}\left( {{{\underset{\_}{\overset{\sim}{d}}}_{m}(i)} - {{\underset{\_}{W}}_{01}{\underset{\_}{X}(i)}{\underset{\_}{W}}_{10}{\underset{\_}{\overset{\sim}{h}}}_{m}}} \right)}^{H} \cdot}}} \\{\left( {{{\underset{\_}{\overset{\sim}{d}}}_{m}(i)} - {{\underset{\_}{W}}_{01}{\underset{\_}{X}(i)}{\underset{\_}{W}}_{10}{\underset{\_}{\overset{\sim}{h}}}_{m}}} \right),} \\{= {{{\underset{\_}{\overset{\sim}{h}}}_{m}^{H}{{\underset{\_}{C}}_{m}(n)}{\underset{\_}{\overset{\sim}{h}}}_{m}} +}} \\{\left( {1 - \lambda} \right){\sum\limits_{i = 0}^{n}{\lambda^{n - i}\left( {{{{\underset{\_}{\overset{\sim}{d}}}_{m}^{H}(i)}{{\underset{\_}{\overset{\sim}{d}}}_{m}(i)}} - {{{\underset{\_}{\overset{\sim}{h}}}_{m}^{H}(i)}{\underset{\_}{W}}_{10}^{H}{{\underset{\_}{X}}^{H}(i)}{\underset{\_}{W}}_{01}^{H}{{\overset{\sim}{d}}_{m}(i)}} -} \right.}}} \\{{{{\underset{\_}{\overset{\sim}{d}}}_{m}^{H}(i)}{\underset{\_}{W}}_{01}{\underset{\_}{X}(i)}{\underset{\_}{W}}_{10}{\underset{\_}{\overset{\sim}{h}}}_{m}} +} \\\left. {{{\underset{\_}{\overset{\sim}{h}}}_{m}^{H}(i)}{\underset{\_}{W}}_{10}^{H}{{\underset{\_}{X}}^{H}(i)}{\underset{\_}{W}}_{01}^{H}{\underset{\_}{W}}_{01}{X(i)}{\underset{\_}{W}}_{10}{\underset{\_}{\overset{\sim}{h}}}_{m}} \right)\end{matrix} & (68)\end{matrix}$as function to be minimized by {tilde over (h)} _(m). The complexgradient of (40) with respect to {tilde over (h)} _(m) ^(H) is given by

$\begin{matrix}{{{{\frac{\partial}{\partial{\underset{\_}{\overset{\sim}{h}}}_{m}^{H}}{J_{m}^{{mod}\; 2}(n)}} = {{{{\underset{\_}{C}}_{m}(n)}{\underset{\_}{\overset{\sim}{h}}}_{m}} + \left( {1 - \lambda} \right)}}\quad}{\sum\limits_{i = 0}^{n}{\lambda^{n - i}\left( {{{- {\underset{\_}{W}}_{10}^{H}}{{\underset{\_}{X}}^{H}(i)}{\underset{\_}{W}}_{01}^{H}{{\overset{\sim}{\underset{\_}{d}}}_{m}(i)}} + {{\underset{\_}{W}}_{10}^{H}{{\underset{\_}{X}}^{H}(i)}{\underset{\_}{W}}_{01}^{H}{\underset{\_}{W}}_{01}{\underset{\_}{X}(i)}{\underset{\_}{W}}_{10}{\underset{\_}{\overset{\sim}{h}}}_{m}}} \right)}}} & (69) \\{\mspace{79mu}{Necessitating}} & \; \\{\mspace{79mu}{{\frac{\partial}{\partial{\underset{\_}{\overset{\sim}{h}}}_{m}^{H}}{J_{m}^{{mod}\; 2}(n)}}\overset{!}{=}0}} & (70)\end{matrix}$can be used to determine ĥ _(m) such that J_(m) ^(mod2)(n) is minimized.Defining

$\begin{matrix}\begin{matrix}{{\underset{\_}{S}(n)} = {\left( {1 - \lambda} \right){\sum\limits_{i = 0}^{n}{\lambda^{n - i}{\underset{\_}{W}}_{10}^{H}{{\underset{\_}{X}}^{H}(i)}{\underset{\_}{W}}_{01}^{H}{\underset{\_}{W}}_{01}{\underset{\_}{X}(i)}{\underset{\_}{W}}_{10}}}}} \\{= {{\lambda\;{\underset{\_}{S}\left( {n - 1} \right)}} + {\left( {1 - \lambda} \right){\underset{\_}{W}}_{10}^{H}{{\underset{\_}{X}}^{H}(n)}{\underset{\_}{W}}_{01}^{H}{\underset{\_}{W}}_{01}{\underset{\_}{X}(n)}{\underset{\_}{W}}_{10}}}}\end{matrix} & (71) \\{and} & \; \\\begin{matrix}{{{\underset{\_}{s}}_{m}(n)} = {\left( {1 - \lambda} \right){\sum\limits_{i = 0}^{n}{\lambda^{n - i}{\underset{\_}{W}}_{10}^{H}{{\underset{\_}{X}}^{H}(i)}{\underset{\_}{W}}_{01}^{H}{{\underset{\_}{\overset{\sim}{d}}}_{m}(i)}}}}} \\{= {{\lambda\;{{\underset{\_}{s}}_{m}\left( {n - 1} \right)}} + {\left( {1 - \lambda} \right){\underset{\_}{W}}_{10}^{H}{{\underset{\_}{X}}^{H}(n)}{\underset{\_}{W}}_{01}^{H}{{\underset{\_}{\overset{\sim}{d}}}_{m}(n)}}}}\end{matrix} & (72)\end{matrix}$we may additionally consider (41) and (42) to write(S(n)+C _(m)(n)){tilde over (h)} _(m)=s _(m)(n).  (73)

Now, we assume we have obtained a solution {tilde over (h)} _(m)(n−1)for {tilde over (h)} _(m) in the previous iteration which fulfills(S(n−1)+C _(m)(n−1)){tilde over (h)} _(m)(n−1)=s _(m)(n−1).  (74)and we want to obtain {tilde over (h)} _(m)(n) such thatReplacing s _(m)(n) and s _(m)(n−1) in (44) by (S(n)+C _(m)(n)){tildeover (h)} _(m)(n) and (S(n−1)+{tilde over (C)}_(m)(n−1))h _(m)(n−1)respectively, we obtain{tilde over (s)} _(m)(n)=λ{tilde over (s)} _(m)(n−1)−(1−λ)W ₀₁ ^(H) X^(H)(n)W ₁₀ ^(H) {tilde over (d)} _(m)  (76)(S(n)+C _(m)(n){tilde over (h)} _(m)(n)=λS(n−1){tilde over (h)}_(m)(n−1)+λC _(m)(n−1){tilde over (h)} _(m)(n−1)+(1−λ)W ₁₀ ^(H) X^(H)(n)W ₀₁ ^(H) {tilde over (d)} _(m)(n)  (77)replacing λS(n−1) by reformulating (43) toS(n)−(1−λ)W ₁₀ ^(H) X ^(H)(n)W ₀₁ ^(H) W ₀₁ X(n)W ₁₀=λS(n−1)  (78)and by this formula (79) is obtained(S(n)+C _(m)(n)){tilde over (h)} _(m)(n)=S(n){tilde over (h)}_(m)(n−1)+λC _(m)(n−1){tilde over (h)} _(m)(n−1)−(1−λ)W ₁₀ ^(H) X^(H)(n)W ₀₁ ^(H) W ₀₁ X(n)W ₁₀ {tilde over (h)} _(m)(n−1)+(1−λ)W ₁₀ ^(H)X ^(H)(n)W ₀₁ ^(H) {tilde over (d)} _(m)(n)  (79)with adding 0=C _(m)(n−1){tilde over (h)} _(m)(n−1)−C _(m)(n−1){tildeover (h)}_(m)(n−1), we may write(S(n)+C _(m)(n)){tilde over (h)} _(m)(n)=(S(n)+C _(m)(n−1)){tilde over(h)} _(m)(n−1)−(1−λ)C _(m)(n−1){tilde over (h)} _(m)(n−1)−(1−λ)W ₁₀ ^(H)X ^(H)(n)W ₀₁ ^(H) W ₀₁ X(n)W ₁₀ {tilde over (h)} _(m)(n−1)+(1−λ)W ₁₀^(H) X ^(H)(n)W ₀₁ ^(H) {tilde over (d)} _(m)(n)=(S(n)+C_(m)(n−1)){tilde over (h)} _(m)(n−1)+(1−λ)(W ₁₀ ^(H) X ^(H)(n)W ₀₁ ^(H){tilde over (d)} _(m)(n)−W ₁₀ ^(H) X ^(H)(n)W ₀₁ ^(H) W ₀₁ X(n)W ₁₀{tilde over (h)} _(m)(n−1)−C _(m)(n−1){tilde over (h)} _(m)(n−1))  (80)usingW ₁₀ ^(H) X ^(H)(n)W ₀₁ ^(H) {tilde over (e)} _(m)(n)=W ₁₀ ^(H) X^(H)(n)W ₀₁ ^(H) {tilde over (d)} _(m)(n)−W ₁₀ ^(H) X ^(H)(n)W ₀₁ ^(H) W₀₁ X(n)W ₁₀ {tilde over (h)} _(m)(n−1)  (81)and formula (39), we obtain(S(n)+C _(m)(n)){tilde over (h)} _(m)(n)=(S(n)+C _(m)(n−1){tilde over(h)} _(m)(n−1)+(1−λ)(W ₁₀ ^(H) X ^(H)(n)W ₀₁ ^(H) {tilde over (e)}_(m)(n)−C _(m)(n−1){tilde over (h)} _(m)(n−1))  (82)and using S(n)+C _(m)(n)≈S(n)+C _(m)(n−1), finally{tilde over (h)} _(m)(n)={tilde over (h)} _(m)(n−1)+(1−λ)(S(n)+C_(m)(n))⁻¹·(W ₁₀ ^(H) X ^(H)(n)W ₀₁ ^(H) {tilde over (e)} _(m)(n)−C_(m)(n−1){tilde over (h)} _(m)(n−1))  (83)

Some of the above-described embodiments provide aloudspeaker-enclosure-microphone system description based on determiningan error signal e(n).

Another embodiment, however, provides a loudspeaker-enclosure-microphonesystem description without determining an error signal.

Considering formula (71) and (72), we may reformulate (73) so that wecan obtain the filter coefficients {tilde over (h)} _(m) withoutdetermining an error signal by using{tilde over (h)} _(m)(n)=(S(n)+C _(m)(n))⁻¹ s _(m)(n)  (84)

The loudspeaker-enclosure-microphone system description provided by oneof the above-described embodiments can be employed for variousapplications. For example, the loudspeaker-enclosure-microphone systemdescription may be employed for listening room equalization (LRE), foracoustic echo cancellation (AEC) or, e.g. for active noise control(ANC).

At first, it is explained how to employ the above-described embodimentsfor acoustic echo cancellation (AEC).

The application of the above-described embodiments for AEC has alreadybeen described above. For example, in FIG. 3, an error signal e(n) isoutput as the result of the apparatus. This error signal e(n) is thetime-domain error signal of the wave-domain error signal {tilde over(e)}(n). {tilde over (e)}(n) itself depends on {tilde over (d)}(n) beingthe wave-domain representation of the recorded microphone signals and{tilde over (y)}(n) being the wave-domain microphone signal estimate.The wave-domain microphone signal estimate {tilde over (y)}(n) itselfmay be provided by the system description application unit 150 whichgenerates the wave-domain microphone signal estimate {tilde over (y)}(n)based on the loudspeaker-enclosure-microphone system description {tildeover (h)} ₀(n), . . . , {tilde over (h)} _(m)(n), . . . , {tilde over(h)} _(N) _(M) ⁻¹(n).

If, for example, a speaker, which represents a local source, is locatedinside a LEMS, then the voices produced by the speaker will not becompensated and still remain in the error signal e(n). All other sounds,however, should be compensated/cancelled in the error signal e(n). Thus,the error signal e(n) represents the voices produced by a local sourceinside the LEMS, e.g. a speaker, but without any acoustic echos, becausethese echos have already been cancelled by forming the differencebetween the actual microphone signals {tilde over (d)}(n) and themicrophone signal estimation {tilde over (y)}(n)

Thus, the quantity e(n) already describes the echo compensated signal.

In the following, the application of the above-described embodiments foractive noise control (ANC) is explained.

The application of state-of-the-art WDAF for ANC has already beenpresented in [15], but in [15], a very limited wave-domain model wasused, for which the nonuniqueness problem does not occur. No measures toimprove the robustness in the presence of the nonuniqueness problem werepresented.

Here, we describe a conventional ANC system in order to point out thatthe application of this invention is not limited to systems working inthe wave domain, although an integration in such a system would be anatural choice. Please note that although the filters for noisecancellation are determined according to a conventional model, thesystem identification is conducted in the wave domain.

FIG. 6a shows an exemplary loudspeaker and microphone setup used forANC. The outer microphone array is termed reference array, the innermicrophone array is termed error array. In FIG. 6a, a noise source isdepicted emitting a sound field which should ideally be cancelled withinthe listening area. As the signal of the noise source is unknown, it hasto be measured. To this end, an additional microphone array outside theloudspeaker array is needed in addition to the previously consideredarray setup. This array is referred to as the reference array, while themicrophone array inside the loudspeaker array is referred to as theerror array.

FIG. 6b illustrates a block diagram of an ANC system. R represents soundpropagation from the noise sources to the reference array. G(n)represents prefilters to facilitate ANC. P illustrates the soundpropagation from the reference array to the error array (primary path),and S is the sound propagation from the loudspeakers to the error array(secondary path).

In FIG. 6b, the unknown signal of the N_(R) microphones of the referencearray is described byd(n)=Rn(n)  (85)using the previously introduced vector and matrix notation. Here, d(n)describes the signal we can obtain from the reference array. This signalis filtered according tox(n)=G(n)d(n)  (86)to obtain the N_(L) loudspeaker signals x(n), which are then emitted bythe loudspeaker array to cancel the noise signal. To ensure acancellation, the N_(E) signals from the error array are considered,which capture the superpositione(n)=Pd(n)+Sx(n),  (87)where the matrix P describes the propagation of the noise from thereference array to the error array and is referred to as the primarypath. The matrix S describes the secondary path from the loudspeakers tothe error array. For ANC, G(n) is ideally determined in a way such that−SG(n)=P  (88)so the error signal e(n) vanishes. Since the MIMO impulse responses Pand S are in general unknown and may also change over time, both have tobe identified. So we consider the identified systems Ŝ(n) and{circumflex over (P)}(n) to obtain G(n) such that−Ŝ(n)G(n)={circumflex over (P)}(n)  (89)

Typically, there are less noise sources than reference microphones(N_(S)<N_(R)), so the nonuniqueness problem does occur for theidentification of P. This is equivalent to the considered AEC scenarioin the prototype description with n(n) in the role of {circumflex over(x)}(n) and R in the role of G_(RS) and P in the role of H. Moreover,there is typically also no unique solution for the identification of S,as there are typically more loudspeakers than noise sources(N_(S)<N_(L)) and x(n) only describes the filtered signals of the noisesources. Obviously, the invention can be used to improve theidentification of P and S, which would then increase the robustness ofthe ANC system. This can be done by obtaining wave-domainidentifications {circumflex over (P)}(n) and Ŝ(n) of P and S, which arethen transformed to their representation in the conventional domain by{circumflex over (P)}(n)=T₁{tilde over (P)}(n)T₂ ⁻¹  (90)Ŝ(n)=T₃{tilde over (P)}(n)T₂ ⁻¹  (91)with T₁ being the transform of the reference signals d(n) to the wavedomain and T₃ being the transform of the loudspeaker signals x(n) to thewave domain. Given that the error signals e(n) are transformed to thewave domain by T₂, T₂ ⁻¹, describes the inverse of this transform or anappropriate approximation.

In the following, listening room equalization is considered. Here, theembodiments for providing a loudspeaker-enclosure-microphone systemdescription may be employed for improving a wave field synthesis (WFS)reproduction by being part of a listening room equalization (LRE)system. WFS (see, e.g. [1]) is used to achieve a highly detailed spatialreproduction of an acoustic scene overcoming the limitations of a sweetspot by using an array of typically several tens to hundreds ofloudspeakers. The loudspeaker signals for WFS are usually determinedassuming free-field conditions. As a consequence, an enclosing roomshall not exhibit significant wall reflections to avoid a distortion ofthe synthesized wave field.

In a lot of application scenarios, the necessitated acoustic treatmentto achieve such room properties may be too expensive or impractical. Analternative to acoustical countermeasures is to compensate for the wallreflections by means of a listening room equalization (LRE), oftentermed listening room compensation. To this end, the reproductionsignals are filtered to pre-equalize the MIMO room system response fromthe loudspeakers to the positions of multiple microphones, ideallyachieving an equalization at any point in the listening area. Theequalizers are determined according to the impulse responses for eachloudspeaker-microphone path. As the MIMOloudspeaker-enclosure-microphone system (LEMS) is expected to changeover time, it has to be continuously identified by adaptive filtering.The task of LRE has often been addressed in the literature. However,systems relying on a system identification of the LEMS have barely beeninvestigated, notably because of the nonuniqueness problem. Employing aloudspeaker-enclosure microphone system description provided accordingto one of the above-described embodiments can significantly improve thesystem identification and therefore also the equalization results.

The above-described embodiments may also be employed together with anyconventional LRE system. The above-described embodiments are not limitedto loudspeaker-enclosure-microphone systems working in the wave domain,although such using the above-described embodiments with suchloudspeaker-enclosure-microphone systems is of advantage. It should benoted that although the equalizers are determined according to aconventional model, in the following, the system identification isconsidered to be conducted in the wave domain.

In the following, a description of a LRE system according to anembodiment is provided. Inter alia, the integration of the invention inan LRE system is explained. For this purpose, reference is made to FIG.6c.

FIG. 6c illustrates a block diagram of an LRE system. T₁ and T₂ depicttransforms to the wave domain. G(n) depict equalizer. H shows the LEMS.{tilde over (H)}(n) illustrates the identified LEMS and H⁽⁰⁾ depicts thedesired impulse response.

In the embodiment of FIG. 6c, an original loudspeaker signal x(n) isequalized such that an equalized loudspeaker signal x′(n) is obtainedaccording tox′(n)=G(n)x(n),  (92)wherex′(n)=((x′₀(n))^(T),(x′₁(n))^(T), . . . ,(x′_(N) _(L)⁻¹(n))^(T))^(T)  (93)with the componentsx′_(λ′)(n)=((x′_(λ′)(nL_(F)−L_(X)+1),x′_(λ′)(nL_(F)−L_(X)+2), . . .,x′_(λ′)(nL_(F)))^(T)  (94)capturing L′_(X) time samples x′_(λ′)(k) of the equalized loudspeakersignal λ′ at time instant k.

Similarly, x(n) is defined as:x(n)=((x₀(n))^(T),(x₁(n))^(T), . . . ,(x_(N) _(L) ⁻¹(n))^(T))^(T)  (95)with the componentsx_(λ)(x_(λ)(nL_(F)−L_(X)+1),x_(λ)(nL_(F)−L_(X)+2) . . . ,x(nL_(F))  (96)capturing L_(X)≤L′_(X) by time samples x_(λ)(k) of the unequalizedloudspeaker signal k at time instant k.

The matrix G(n) is structured such that it describes a convolutionoperation according to

$\begin{matrix}{{{{x^{\prime}}_{\lambda^{\prime}}(n)} = {\sum\limits_{\lambda = 0}^{N_{L} - 1}{\sum\limits_{\kappa = 0}^{L_{H} - 1}{{x_{\lambda}\left( {k - \kappa} \right)}{g_{\lambda^{\prime},\lambda}\left( {\kappa,n} \right)}}}}},} & (97)\end{matrix}$where g_(λ′,λ)(k,n) is the equalizer impulse response from the originalloudspeaker signal λ to the equalized loudspeaker signal λ′. The matrixand vector notation above acts as a prototype for all considered systemand signal descriptions. Although the dimensions of other signal vectorsand system matrices may differ, the underlying structure remains thesame.

Ideally, an LRE system achieves equalizers such thatH⁽⁰⁾=HG(n),  (98)where H⁽⁰⁾ is the desired free field impulse response between theloudspeakers and the microphone. As the true LEMS impulse responses Hare usually not known, this is achieved for the identified system Ĥ(n)such thatĤ(n)G(n)=H⁽⁰⁾,  (99)where we assume a coefficient transform according toĤ(n)=T₁Ĥ(n)T₂ ⁻¹  (100)with T₁ being the transform of the equalized loudspeaker signals to thewave domain and T₂ ⁻¹ being the matrix formulation of the appropriateinverse transform of T₂, which transforms the microphone signals to thewave domain.

As Ĥ(n) is the identified system, there may be indefinitely manysolutions for Ĥ(n) for a given LEMS H, depending on the correlationproperties of the loudspeaker signals. As the solution for G(n)according to (99) depends on Ĥ(n) and the set of possible solutions forĤ(n) can vary with changing correlation properties of the loudspeakersignals, an LRE system shows a very poor robustness against thenonuniqueness problem. At this point, the proposed invention can improvethe system identification and therefore also the robustness of the LRE.

In the following, a description of two algorithms to obtain G(n) fromĤ(n) and H⁽⁰⁾ is provided. At first, however, the LRE signal modelreferred to for the description of the two algorithms is described. Inparticular, the signal model of a multichannel LRE system is explainedconsidering FIG. 6d.

FIG. 6d illustrates an algorithm of a signal model of an LRE system. InFIG. 6d, G(n) represents equalizers, H is a LEMS, Ĥ(n) represents anidentified LEMS, H⁽⁰⁾ is a desired impulse response, x(n) depicts anoriginal loudspeaker signal, x′(n): equalized loudspeaker signal andd(n) illustrates the microphone signal.

The loudspeaker signal vector x(n) in FIG. 6d is illustrated comprisinga block, indexed by n, of L_(X) time-domain samples of all N_(L)loudspeaker signals:x(n)=(x₁(nL_(F)−L_(X)+1), . . . ,x₁(nL_(F)),x₂(nL_(F)−L_(X)+1), . . .,x₂(nL_(F)), . . . ,x_(N) _(L) (nL_(F))),  (101)where x_(l)(k) is a time-domain sample of the l-th loudspeaker signal attime instant k and L_(F) is the frame shift. This signal should beoptimally reproduced under free-field conditions. To remove the unwantedinfluence of the enclosing room on the reproduced sound field, wepre-equalize these signals through G(n) such that

$\begin{matrix}{{{x^{\prime}(n)} = {{G(n)}{x(n)}}},} & (102) \\{{x_{\lambda}^{\prime}(k)} = {\sum\limits_{l = 0}^{N_{L} - 1}{\sum\limits_{\kappa = 0}^{L_{G} - 1}{{x_{l}\left( {k - \kappa} \right)}{g_{\lambda,l}\left( {\kappa,n} \right)}}}}} & \;\end{matrix}$where x′(n) has the same structure as x(n), but comprises only thelatest L_(X)−L_(G)+1 time samples x′_(λ)(k) of the equalized loudspeakersignals.

It should be noted that in formulae (102) to (124) and the part of thedescription that refers to formulae (102) to (124) index l may be usedas an index for a loudspeaker signal rather than an index for awave-field component. Moreover, it should be noted, that in formulae(102) to (124) and the part of the description that refers to formulae(102) to (124) index m may be used as an index for a microphone signalrather than an index for a wave-field component.

The unequalized loudspeaker signals x(n) are referred to as originalloudspeaker signals in the following. The equalizer impulse responsesg_(λ,1)(k, n), of length L_(G) from the original loudspeaker signal l tothe actual loudspeaker signal λ have to be determined via identifyingthe LRE system first. To this end, the signals x′(n) are fed to the LEMSand the resulting microphone signals are observed:

$\begin{matrix}{{{d(n)} = {{Hx}^{\prime}(n)}},} & (103) \\{{d_{m}(k)} = {\sum\limits_{\lambda = 0}^{N_{L} - 1}{\sum\limits_{\kappa = 0}^{L_{H} - 1}{{x_{\lambda}^{\prime}\left( {k - \kappa} \right)}{h_{m,\lambda}(\kappa)}}}}} & \;\end{matrix}$where h_(m,λ)(k) describes the room impulse response of length L_(H)from loudspeaker λ to microphone m and is assumed to be time-invariantin this paper. Here, L_(X)−L_(G)−L_(H)+2 time samples d_(m)(k) of theN_(M) microphone signals are comprised in d(n). Using the observationsof x′(n) and d(n), the system. H is identified by {tilde over (H)}(n) bymeans of an adaptive filtering algorithm, e. g., the GFDAF [1] whichminimizes the squared error term

$\begin{matrix}{{\sum\limits_{i = 0}^{n}{\lambda_{a}^{n - i}{e^{H}(i)}{e(i)}}},} & (104) \\{with} & \; \\{{e(n)} = {{d(n)} - {{\hat{H}(n)}{x^{\prime}(n)}}}} & \;\end{matrix}$with the exponential forgetting factor λ_(a). The coefficients containedin {tilde over (H)}(n) are used for the equalizer determination asexplained in the following section.

In the following, the determination of the equalizer coefficients isexplained starting with the FxGFDAF, which was the inspiration for theproposed approach explained afterward.

The signal model for the Filtered-X GFDAF (FxGFDAF) is shown in FIG. 6e.In FIG. 6e, a filtered-X structure is illustrated. {tilde over (H)}(n)depicts an identified LEMS, Ĝ(n) shows equalizers, H⁽⁰⁾ is a free-fieldimpulse responses, {circumflex over (x)}(n) is an excitation signal,{circumflex over (z)}(n) depicts a filtered excitation signal,{circumflex over (d)}(n) is a desired microphone signal.

The excitation signal {circumflex over (x)}(n) of FIG. 6e is structuredas x(n) but comprising 2L_(G)+L_(H)−1 samples for each l and may beequal to x(n) or simply a white-noise signal [25]. The desiredmicrophone signals comprise 2L_(G) samples for each m and are obtainedaccording tod_(l)(n)=H⁽⁰⁾{circumflex over (x)}_(l)  (105)where H⁽⁰⁾ is structured like H containing the desired free-fieldimpulse responses h_(m,1) ⁽⁰⁾ and {circumflex over (x)}₁(n) defined as{circumflex over (x)}(n) for a sole excitation of loudspeaker l and withall other components set to zero. The equalizers for every originalloudspeaker signal are determined separately, assuming that not only thesuperposition of all signals, but also each individual original signalshould be equalized. This sufficient (although not necessary)requirement for a global equalization increases the robustness of thesolution against changing correlation properties of the loudspeakersignals and reduces the dimensions of the inverse in formula (114). Theequalizer responses g_(λ,1)(k,n) are captured by the vectors g_(1,λ)(n)and then transformed to the DFT-domain and concatenatedg_(λ,1)=(g_(λ,1)(0,n),g_(λ,1)(1,n), . . .,g_(λ,1)(L_(G)−1,n))^(T)  (106)g _(l)=((F_(L) _(G) g_(0,1)(n))^(T), . . . ,(F_(L) _(G) g_(N) _(L),l(n))^(T))^(T)  (107)using the unitary L_(G)×L_(G) DFT matrix F_(L) _(G) . For time-domainzero padding and windowing operations, the following definitions areprovided:

$\begin{matrix}{{\underset{\_}{W}}_{01} = {I_{N_{M}} \otimes \left( {{F_{L_{G}}\left( {0,I_{L_{G}}} \right)}F_{2\; L_{G}}^{H}} \right)}} & (108) \\{{\underset{\_}{W}}_{10} = {I_{N_{L}} \otimes \left( {{F_{2L_{G}}\left( {0,I_{L_{G}},0} \right)}^{T}F_{L_{G}}^{H}} \right)}} & (109)\end{matrix}$with the Kronecker product denoted by ⊗ and the N_(M)×N_(M) identitymatrix I_(N) _(M) . Thus, the error may be defined to be minimized inthe DFT domain by

$\begin{matrix}{{{\hat{e}}_{l}(n)} = {{\left( {I_{N_{M}} \otimes F_{L_{G}}} \right){{\hat{d}}_{l}(n)}} - {{\underset{\_}{\overset{\sim}{W}}}_{01}{{\underset{\_}{\overset{\sim}{Z}}}_{l}(n)}{\underset{\_}{\overset{\sim}{W}}}_{10}{{\underset{\_}{g}}_{l}\left( {n - 1} \right)}}}} & (110)\end{matrix}$

Here, the matrix {circumflex over (Z)} _(l)(n) is constructed from thecomponents of {circumflex over (z)}(n){tilde over (Z)} _(m,λ,l)(n)=Diag{F_(2L) _(G) {tilde over(z)}_(m,λ,l)(n)}  (111)according to the following example for N_(L)=3, N_(M)=2:

$\begin{matrix}{{{\underset{\_}{\overset{\circ}{Z}}}_{l}(n)} = \begin{pmatrix}{{\underset{\_}{\overset{\circ}{Z}}}_{0,0,l}(n)} & {{\underset{\_}{\overset{\circ}{Z}}}_{0,1,l}(n)} & {{\underset{\_}{\overset{\circ}{Z}}}_{0,2,l}(n)} \\{{\underset{\_}{\overset{\circ}{Z}}}_{1,0,l}(n)} & {{\underset{\_}{\overset{\circ}{Z}}}_{1,1,l}(n)} & {{\underset{\_}{\overset{\circ}{Z}}}_{1,2,l}(n)}\end{pmatrix}} & (112)\end{matrix}$

The N_(L) ²N_(M) components {circumflex over (z)}_(m,λ,1)(n) of{circumflex over (Z)} _(l)(n) are obtained by filtering each componentof {circumflex over (x)}(n) (indexed by l) with every input-output pathĥ_(m,λ)(k,n) (indexed by λ and m, respectively) of the identified LEMSĤ(n). This implies a considerable computational effort scaling withapproximately O(N_(L) ²N_(M)(L_(H)+2L_(G))log(L_(H)+2L_(G))) when usingfast convolution. This is comparable to the effort for determining ŝ_(l) ⁻¹(n){circumflex over (z)} _(l) ^(H)(n) in formula (114) whichscales approximately with O(N_(L) ³L_(G)), when using the recursiverealization proposed in [14].

The cost function to be minimized for optimizing g _(l)(n) is then

$\begin{matrix}{{{\overset{\circ}{J}}_{l}(n)} = {\left( {1 - \lambda_{b}} \right){\sum\limits_{i = 0}^{n}{\lambda_{b}^{n - i}{{\underset{\_}{\overset{\circ}{e}}}_{l}^{H}(i)}{{\underset{\_}{\overset{\circ}{e}}}_{l}(i)}}}}} & (113)\end{matrix}$With a derivation and an approximation similar to [14] we obtain theupdate ruleg _(l)(n)=g _(l)(n−1)+μ_(b)(1−λ_(b)){tilde over (W)} ₁₀ ^(H) S _(l)⁻¹(n){tilde over (Z)} _(l) ^(H)(n){tilde over (W)} ₀₁ ^(H) {tilde over(e)} _(l)(n)  (114)with the step size parameter 0≤μ_(b)≤1 and

$\begin{matrix}{{{\overset{\circ}{\underset{\_}{S}}}_{l}(n)} = {{\lambda_{b}{{\overset{\circ}{\underset{\_}{S}}}_{l}\left( {n - 1} \right)}} + {\left( {1 - \lambda_{b}} \right)\frac{1}{2}\left( {{{{\overset{\circ}{\underset{\_}{Z}}}_{l}^{H}(n)}{{\overset{\circ}{\underset{\_}{Z}}}_{l}(n)}} + {{\overset{\circ}{\underset{\_}{R}}}_{l}(n)}} \right)}}} & (115)\end{matrix}$where we use a Tikhonov regularization with a weighting factor δ_(b) bydefining

$\begin{matrix}{{{\underset{\_}{\overset{\circ}{R}}}_{l}(n)} = {\frac{\delta_{b}}{N_{L}}{I_{N_{L}} \otimes {\sum\limits_{\lambda = 0}^{N_{L} - 1}{\sum\limits_{\mu = 0}^{N_{M} - 1}{{{\underset{\_}{\overset{\circ}{Z}}}_{m,\lambda,l}(n)}{{\underset{\_}{\overset{\circ}{Z}}}_{m,\lambda,l}^{H}(n)}}}}}}} & (116)\end{matrix}$The matrix Ŝ(n) is a sparse matrix, which reduces the computationaleffort drastically [14].

In the following, the provided DFT-Domain Approximate Inverse Filtering,and the DFT-domain equalizer determination is presented. Similarly tothe FxGFDAF, this algorithm is formulated for each original loudspeakersignal l independently, but in contrast to the FxGFDAF description, weconsider the difference of the overall system response H(n){tilde over(W)} ₁₀ g _(l)(n) to the desired system responses h _(l) ⁽⁰⁾(n) directlyand obtain{tilde over (e)} _(l)=h _(l) ⁽⁰⁾(n)−H(n){tilde over (W)} ₁₀ g_(l)(n−1)  (117)withh_(m,l) ⁽⁰⁾=(h_(m,l) ⁽⁰⁾(0),h_(m,l) ⁽⁰⁾(1), . . . ,h_(m,l)⁽⁰⁾(2L_(G)))^(T),  (118)h_(l) ⁽⁰⁾(n)=((F_(2L) _(G) h_(0,l) ⁽⁰⁾(n))^(T), . . . ,(F_(2L) _(G)h_(N) _(M) _(−1,t) ⁽⁰⁾(n))^(T))^(T)

The identified system responses of the LEMS are captured in H(n)according to the following example for N_(L)=3, N_(M)=2:

$\begin{matrix}{{\underset{\_}{H}(n)} = \begin{pmatrix}{{\underset{\_}{H}}_{0,0}(n)} & {{\underset{\_}{H}}_{0,1}(n)} & {{\underset{\_}{H}}_{0,2}(n)} \\{{\underset{\_}{H}}_{1,0}(n)} & {{\underset{\_}{H}}_{1,1}(n)} & {{\underset{\_}{H}}_{1,2}(n)}\end{pmatrix}} & (119)\end{matrix}$withH _(m,λ)(n)=Diag{F_(2L) _(G) (I_(L) _(G) ,0)^(T)ĥ_(m,λ)(n)}  (120)where ĥ_(m,λ)(n) describes the identified impulse response fromloudspeaker λ to microphone m, zero-padded or truncated to length L_(G).In contrast to formula (110) we need no windowing by W ₀₁ in formula(117) because of the chosen impulse response lengths. To iterativelyminimize the cost function{tilde over (J)}_(l)(n)={tilde over (e)} _(l) ^(H)(n){tilde over (e)}_(l)(n)  (121)we again follow a derivation similar to [14] and set the gradient tozero. From this the formula

$\begin{matrix}{{{\underset{\_}{\overset{\sim}{W}}}_{10}^{H}{{\underset{\_}{H}}^{H}(n)}{\underset{\_}{\overset{\sim}{W}}}_{10}{{\underset{\_}{g}}_{l}(n)}} = {{{\underset{\_}{\overset{\sim}{W}}}_{10}^{H}{{\underset{\_}{H}}^{H}(n)}{\underset{\_}{\overset{\sim}{W}}}_{10}{{\underset{\_}{g}}_{l}\left( {n + 1} \right)}} + {\underset{\_}{\overset{\sim}{W}}}_{10}^{H}}} & (122) \\{\mspace{45mu}{{{\underset{\_}{H}}^{H}(n)}{{\underset{\_}{\overset{\sim}{e}}}_{l}(n)}}} & \;\end{matrix}$is obtained as the system of equations to be solved for obtaining theoptimum g _(l)(n). For multichannel systems this means an enormouscomputational effort. Therefore we propose the following adaptation rulefor iteratively determining the optimum equalizer:g _(l)(n):=g _(l)(n−1)+μ_(c) {tilde over (W)} ₁₀ ^(H)(H^(H)(n)H(n)+R(n))⁻¹·H ^(H)(n){tilde over (e)} _(l)(n),  (123)where we introduced a Tikhonov regularization with a weighting factorδ_(c) with

$\begin{matrix}{{\underset{\_}{R}(n)} = {\frac{\delta_{b}}{N_{L}}{I_{N_{L}} \otimes {\sum\limits_{\lambda = 0}^{N_{L} - 1}{\sum\limits_{\mu = 0}^{N_{M} - 1}{{{\underset{\_}{H}}_{m,\lambda}(n)}{{\underset{\_}{H}}_{m,\lambda}^{H}(n)}}}}}}} & (124)\end{matrix}$

Here, H ^(H)(n)H(n) is a sparse matrix like Ŝ _(l)(n), allowing acomputationally inexpensive inversion (see [26]). The update rule offormula (123) is similar to the approximation in [26], but in additionwe introduce an iterative optimization of g_(l)(n) which becomespossible due the consideration of e _(l)(n).

FIG. 6f illustrates a system for generating filtered loudspeaker signalsfor a plurality of loudspeakers of a loudspeaker-enclosure-microphonesystem according to an embodiment. In an embodiment, the system of FIG.6f may be configured for listening room equalization, for example asdescribed with reference to FIG. 6c, FIG. 6d or FIG. 6e. In anotherembodiment, the system of FIG. 6f may be configured for active noisecancellation, for example as described with reference to FIG. 6b.

The system of the embodiment of FIG. 6f comprises a filter unit 680 andan apparatus 600 for providing a currentloudspeaker-enclosure-microphone system description. Moreover, FIG. 6fillustrates a LEMS 690.

The apparatus 600 for providing the currentloudspeaker-enclosure-microphone system description is configured toprovide a current loudspeaker-enclosure-microphone system description ofthe loudspeaker-enclosure-microphone system to the filter unit (680).

The filter unit 680 is configured to adjust a loudspeaker signal filterbased on the current loudspeaker-enclosure-microphone system descriptionto obtain an adjusted filter. Moreover, the filter unit 680 is arrangedto receive a plurality of loudspeaker input signals. Furthermore, thefilter unit 680 is configured to filter the plurality of loudspeakerinput signals by applying the adjusted filter on the loudspeaker inputsignals to obtain the filtered loudspeaker signals.

FIG. 6g illustrates a system for generating filtered loudspeaker signalsfor a plurality of loudspeakers of a loudspeaker-enclosure-microphonesystem according to an embodiment showing more details. The system ofFIG. 6g may be employed for listening room equalization. In FIG. 6g, thefirst transformation unit 630, the second transformation unit 640, thesystem description generator 650, its system description applicationunit 660, its error determiner 670 and its system description generationunit 680 correspond to the first transformation unit 130, the secondtransformation unit 140, the system description generator 150, thesystem description application unit 160, the error determiner 170 andthe system description generation unit 180 of FIG. 1b, respectively.

Furthermore, the system of FIG. 6g comprises a filter unit 690. Asalready described with reference to FIG. 6f, the filter unit 690 isconfigured to adjust a loudspeaker signal filter based on the currentloudspeaker-enclosure-microphone system description to obtain anadjusted filter. Moreover, the filter unit 690 is arranged to receive aplurality of loudspeaker input signals. Furthermore, the filter unit 690is configured to filter the plurality of loudspeaker input signals byapplying the adjusted filter on the loudspeaker input signals to obtainthe filtered loudspeaker signals.

In an embodiment, a method for determining at least two filterconfigurations of a loudspeaker signal filter for at least two differentloudspeaker-enclosure-microphone system states is provided.

For example, the loudspeakers and the microphones of theloudspeaker-enclosure-microphone system may be arranged in a concerthall. When the concert hall is crowded with people and all seats of theconcert hall, the loudspeaker-enclosure-microphone system may be in afirst state, e.g. the impulse responses regarding the output loudspeakersignals and the recorded microphone signals may have first values. Whenonly half of the seats of the concert hall are covered by people, theloudspeaker-enclosure-microphone system may be in a second state, e.g.the impulse responses regarding the output loudspeaker signals and therecorded microphone signals may have second values.

According to the method, a first loudspeaker-enclosure-microphone systemdescription of the loudspeaker-enclosure-microphone system isdetermined, when the loudspeaker-enclosure-microphone system has a firststate (e.g. the impulse responses of the loudspeaker signals and therecorded microphone signals have first values, e.g. the concert hall iscrowded). Then a first filter configuration of a loudspeaker signalfilter is determined based on the first loudspeaker-enclosure-microphonesystem description, for example, such that the loudspeaker signal filterrealizes acoustic echo cancellation. The first filter configuration isthen stored in a memory.

Then, a second loudspeaker-enclosure-microphone system description ofthe loudspeaker-enclosure-microphone system is determined, when theloudspeaker-enclosure-microphone system has a second state, e.g. theimpulse responses of the loudspeaker signals and the recorded microphonesignals have second values, e.g. only half of the concert hall areoccupied. Then, a second filter configuration of the loudspeaker signalfilter is determined based on the secondloudspeaker-enclosure-microphone system description, for example, suchthat the loudspeaker signal filter realizes acoustic echo cancellation.The second filter configuration is then stored in the memory.

The loudspeaker signal itself filter may be arranged to filter aplurality of loudspeaker input signals to obtain a plurality of filteredloudspeaker signals for steering a plurality of loudspeakers of aloudspeaker-enclosure-microphone system.

For example, under test conditions, a first filter configuration may bedetermined when the loudspeaker-enclosure-microphone system has a firststate, and a second filter configuration may be determined when theloudspeaker-enclosure-microphone system has a second state. Later, underreal conditions, either the first or the second filter configuration maybe used for acoustic echo cancellation depending on whether, e.g. theconcert hall is crowded or whether only half of the seats are occupied.

The performance and the properties of the algorithms according to theabove-described embodiments for providing aloudspeaker-enclosure-microphone system description will now beevaluated. To this end, the results from an experimental evaluation ofthe proposed approach are presented. At first, the results for anexperiment under optimal conditions are considered.

For the simulation of the LEMS, we used the measured impulse responsesfor the LEMS described above with N_(L)=48 loudspeakers and N_(M)=10microphones. Using a sampling frequency of f_(s)=11025 Hz, the impulseresponses were truncated to 3764 samples. This is slightly shorter thanthe modeled length of the impulse responses which is L_(H)=4096, soeffects resulting from an unmodeled impulse response tail are absent.The loudspeaker signals were determined by using WFS [1] so that planewaves could be synthesized within the loudspeaker array. The incidenceangles of the plane waves were chosen to be φ₁1=0 and φ₂=π/2, where theplane waves were alternatingly or simultaneously synthesized to simulatea change of G_(RS) over time. The length of all FIR filters used for theWFS was L_(G)=135. To reduce the computational complexity, we used theapproximations of both algorithms described by (53) and (58),respectively such that the respective matrices can be inverted frequencybin-wise [14]. Furthermore, we used a frame shift L_(F) of 512 samplesand a forgetting factor of λ_(a) of 0.95, while both algorithms wereregularized with β=0.05. For the modified GFDAF the parameters β₀=2,β₁=0.01, and β₂=0.1 were chosen. To avoid divergence at the beginning ofthe adaptation we used S(0)={circumflex over (σ)}I with the identitymatrix I of appropriate dimensions and {circumflex over (σ)} being anapproximation of the steady state mean value of the diagonal entries ofS(n) after the first four seconds of the experiment. This can beconsidered as a nearly optimum initialization value. For the comparisonthe ERLE (17) and the normalized misalignment (22) for the differentapproaches are shown.

Now, model validation is provided. The results shown are used tovalidate the proposed model and the improved system descriptionperformance of the proposed algorithm.

Mutually uncorrelated white noise signals were used as source signalsfor the synthesized plane waves. The timeline for this experiment can bedescribed as follows: For the time span 0≤t<5 s only one plane wave withan incidence angle of φ₁ was synthesized. For the time span 5≤t<10 sanother plane wave with an incidence angle of φ₁ was synthesized. For10≤t<15 s both plane waves were simultaneously synthesized.

The results for this experiment are shown in FIG. 7. It can be seen thatthere is a breakdown in ERLE for both considered approaches at t=5 swhen the first plane wave is no longer synthesized and the second one issynthesized instead. A smaller breakdown can be seen at t=10 s when thefirst plane wave is synthesized again in addition to the second one. Thebreakdown at t=5 s can be expected for any approach because newproperties of the LEMS are revealed when the second plane wave issynthesized. Those properties are then to be identified by therespective adaptation algorithm. The second breakdown can, at least intheory, be avoided because solutions for both plane waves were alreadyfound separately. Hence, this breakdown only depends on how much of thesolution for the first plane wave an algorithm “forgets” to obtain asolution for the second plane wave.

As cost for the reduced misalignment shown in the lower plot, themodified GFDAF shows a slightly slower increasing ERLE during the firstfive seconds. However, whenever the source activity changes, there is asomewhat lower breakdown in ERLE for the modified GFDAF. Additionally,the modified GFDAF shows a larger steady state ERLE, compared to theoriginal GFDAF. This is due to the fact that both algorithms wereapproximated and only an exact implementation of (53) would beguaranteed to reach the global optimum e.g. maximize ERLE. So bothalgorithms converge to a local minimum and the lower misalignment of themodified GFDAF is an advantage, as it denotes a lower distance to theperfect solution, which is a global optimum.

In the lower part of FIG. 7, it can be clearly seen that the modifiedGFDAF outperforms the original GFDAF regarding the normalizedmisalignment. The relatively low absolute performance of both algorithmsis not surprising as the identification of the LEMS is a severelyunderdetermined problem in the given scenario, according to (21).Evaluating (23) we obtain only −0.2 dB as a lower bound for thenormalized misalignment in this scenario. From this we can see that theoriginal GFDAF can exploit almost all information provided by theobserved signals when achieving −0.16 dB. The reduction of themisalignment by additional 1.4 dB by the modified version can beaccounted to the information provided by the wave-domain assumptions on{tilde over (H)}(n). As the misalignment is relatively high for bothapproaches, no correlation with the results for the ERLE can be seen.

For the comparison with a conventional AEC we repeated the sameexperiment using T₁=I and T₂=I with the respective dimensions and theoriginal GFDAF. As the obtained results almost perfectly coincide withthe results for wave-domain AEC with the original GFDAF, they are notshown in FIG. 7. This behaviour is remarkable as the conclusion may bedrawn that a transformation of the used signal representations to thewave-domain alone does not automatically lead to a different convergencebehaviour. Nevertheless, using WDAF is still advantageous regardless ofthe used adaptation algorithm, as the computational effort foradaptation can be concluded by an approximative LEMS model.

In the following, results for two experiments with suboptimal conditionsare presented to show the gain in robustness of the concepts provided byembodiments.

Up to now the experiments were conducted under almost optimalconditions, e.g., in absence of noise or interferences in the microphonesignal and using a nearly optimum initialization value for S(0). In thissection we present results for documenting the robustness of theproposed approach with two different experiments under suboptimalconditions.

At first, the experiment of the previous subsection was repeated,starting the adaptation with an suboptimal initialization valueS(0)={circumflex over (σ)}I/10000. Such an suboptimal choice is morerealistic because the chosen initialization value for S(n) used in theprevious section depends on knowledge which is not available inpractice. The results for this experiment are depicted in FIG. 8.

The ERLE curves show for both approaches a slower convergence in thefirst 5 seconds compared to the previous experiment, although themodified GFDAF is less affected in this regard. After the transition,the difference between both algorithms becomes even more evident. Whilethe modified GFDAF only shows a short breakdown in ERLE, the originalGFDAF takes significantly longer to recover. Moreover, the originalGFDAF shows a significantly lower steady state ERLE than the modifiedversion during the entire experiment. Considering the achievedmisalignment for both approaches, this behavior can be explained: Theoriginal GFDAF suffers from a bad initial convergence and cannot recoverthroughout the whole experiment, while the modified GFDAF is onlyslightly affected.

In the second experiment short impulses (50 ms) of noise were introducedinto the microphone signal, leading to two adaptation steps in thepresence of an interfering signal. This experiment was chosen because inpractice an undetected double-talk situation may also lead to anadaptation in the presence of an interfering signal and double-talkdetectors are usually not perfectly reliable. Although the signals usedhere differ significantly from the signals present in practice, theeffect on the convergence behaviour of the adaptation algorithms can beexpected to be similar. The interfering signal used was generated byconvolving a single white noise signal with impulse responses measuredfor the considered microphone array in a completely different setup.This was done to model an interferer recorded by the microphone arrayrather than an interference taking effect on the microphone signalsdirectly. The noise power was chosen to be 6 dB relative to theunaltered microphone signal. The results for this experiment can be seenin FIG. 9. The timeline for this experiment differs from the previousones. We introduced the noise interferences at t=5 s and t=15 s. Fromthe beginning to t=25 s the first plane wave (φ₁=0) was synthesized andfrom t=25 s until the end the second plane wave ((φ₂=π/2) wassynthesized. It can be seen that both algorithms are equally affected bythe impulsive noise. However, in contrast to the original GFDAF, themodified GFDAF shows a significantly larger ERLE when having recoveredfrom the disturbances. The difference in behavior is even more evident,when there is a transition between both waves. There, the original GFDAFshows a pronounced breakdown in ERLE while the modified GFDAF canrecover quickly. Again, the normalized misalignment may be used toexplain the observed behaviour. It can be clearly seen that the originalGFDAF shows a growing misalignment with every disturbance while themodified GFDAF is not sensitive to this interference.

Adaptation algorithms based on robust statistics (see [24]) could alsobe used to increase robustness in such a scenario. However, as they onlyuse the information provided by the observed signals, they can beexpected to principally show the same behaviour as the original GFDAF,although the misalignment introduced by the interferences should besmaller.

Improved concepts for AEC in the wave domain maintaining robustness inthe presence of the nonuniqueness problem have been presented.

It has been shown that the nonuniqueness problem is typically highlyrelevant for AEC in combination with massive multichannel reproductionsystems. Considering a concentric setup of a circular loudspeaker arrayand a circular microphone array, it was shown that the spatial DFT canbe used as transform to the wave domain. Using a model based on thesetransforms, distinct properties of the LEMS model were investigated. Amodified version of the GFDAF was presented to exploit these propertiesin order to significantly reduce the consequences of the nonuniquenessproblem. Results from an experimental evaluation support the claim of anincreased robustness and showed an improved system descriptionperformance.

Although some aspects have been described in the context of anapparatus, it is clear that these aspects also represent a descriptionof the corresponding method, where a block or device corresponds to amethod step or a feature of a method step. Analogously, aspectsdescribed in the context of a method step also represent a descriptionof a corresponding block or item or feature of a correspondingapparatus.

Depending on certain implementation requirements, embodiments of theinvention can be implemented in hardware or in software. Theimplementation can be performed using a digital storage medium, forexample a floppy disk, a DVD, a CD, a ROM, a PROM, an EPROM, an EEPROMor a FLASH memory, having electronically readable control signals storedthereon, which cooperate (or are capable of cooperating) with aprogrammable computer system such that the respective method isperformed.

Some embodiments according to the invention comprise a data carrierhaving electronically readable control signals, which are capable ofcooperating with a programmable computer system, such that one of themethods described herein is performed.

Generally, embodiments of the present invention can be implemented as acomputer program product with a program code, the program code beingoperative for performing one of the methods when the computer programproduct runs on a computer. The program code may for example be storedon a machine readable carrier.

Other embodiments comprise the computer program for performing one ofthe methods described herein, stored on a machine readable carrier or anon-transitory storage medium.

In other words, an embodiment of the inventive method is, therefore, acomputer program having a program code for performing one of the methodsdescribed herein, when the computer program runs on a computer.

A further embodiment of the inventive methods is, therefore, a datacarrier (or a digital storage medium, or a computer-readable medium)comprising, recorded thereon, the computer program for performing one ofthe methods described herein.

A further embodiment of the inventive method is, therefore, a datastream or a sequence of signals representing the computer program forperforming one of the methods described herein. The data stream or thesequence of signals may for example be configured to be transferred viaa data communication connection, for example via the Internet.

A further embodiment comprises a processing means, for example acomputer, or a programmable logic device, configured to or adapted toperform one of the methods described herein.

A further embodiment comprises a computer having installed thereon thecomputer program for performing one of the methods described herein.

In some embodiments, a programmable logic device (for example a fieldprogrammable gate array) may be used to perform some or all of thefunctionalities of the methods described herein. In some embodiments, afield programmable gate array may cooperate with a microprocessor inorder to perform one of the methods described herein. Generally, themethods may be performed by any hardware apparatus.

While this invention has been described in terms of several embodiments,there are alterations, permutations, and equivalents which will beapparent to others skilled in the art and which fall within the scope ofthis invention. It should also be noted that there are many alternativeways of implementing the methods and compositions of the presentinvention. It is therefore intended that the following appended claimsbe interpreted as including all such alterations, permutations, andequivalents as fall within the true spirit and scope of the presentinvention.

LITERATURE

-   [1] A. Berkhout, D. De Vries, and P. Vogel, “Acoustic control by    wave field synthesis”, J. Acoust. Soc. Am. 93, 2764-2778 (1993).-   [2] J. Daniel, “Spatial sound encoding including near field effect:    Introducing distance coding filters and a variable, new ambisonic    format”, in 23rd International Conference of the Audio Eng. Soc.    (2003).-   [3] M. Sondhi and D. Berkley, “Silencing echoes on the telephone    network”, Proceedings of the IEEE 68, 948-963 (1980).-   [4] B. Kingsbury and N. Morgan, “Recognizing reverberant speech with    RASTA-PLP”, in IEEE International Conference on Acoustics, Speech,    and Signal Processing (ICASSP), volume 2, 1259-1262 (Munich,    Germany) (1997).-   [5] M. Sondhi, D. Morgan, and J. Hall, “Stereophonic acoustic echo    cancellation—an overview of the fundamental problem”, IEEE Signal    Process. Lett. 2, 148-151 (1995).-   [6] J. Benesty, D. Morgan, and M. Sondhi, “A better understanding    and an improved solution to the specific problems of stereophonic    acoustic echo cancellation”, IEEE Trans. Speech Audio Process. 6,    156-165 (1998).-   [7] A. Gilloire and V. Turbin, “Using auditory properties to improve    the behaviour of stereophonic acoustic echo cancellers”, in IEEE    International Conference on Acoustics, Speech, and Signal Processing    (ICASSP), volume 6, 3681-3684 (Seattle, Wash.) (1998).-   [8] T. Gänsler and P. Eneroth, “Influence of audio coding on    stereophonic acoustic echo cancellation”, in IEEE International    Conference on Acoustics, Speech, and Signal Processing (ICASSP),    volume 6, 3649-3652 (Seattle, Wash.) (1998).-   [9] D. Morgan, J. Hall, and J. Benesty, “Investigation of several    types of nonlinearities for use in stereo acoustic echo    cancellation”, IEEE Trans. Speech Audio Process. 9, 686-696 (2001).-   [10] M. Ali, “Stereophonic acoustic echo cancellation system using    time-varying all-pass filtering for signal decorrelation”, in IEEE    International Conference on Acoustics, Speech, and Signal Processing    (ICASSP), volume 6, 3689-3692 (Seattle, Wash.) (1998).-   [11] J. Herre, H. Buchner, and W. Kellermann, “Acoustic echo    cancellation for surround sound using perceptually motivated    convergence enhancement”, in IEEE International Conference on    Acoustics, Speech, and Signal Processing (ICASSP), volume 1,    1-17-1-20 (Honolulu, Hi.) (2007).-   [12] S. Shimauchi and S. Makino, “Stereo echo cancellation algorithm    using imaginary input-output relationships”, in IEEE International    Conference on Acoustics, Speech, and Signal Processing (ICASSP),    volume 2, 941-944 (Atlanta, Ga.) (1996).-   [13] H. Buchner, S. Spors, and W. Kellermann, “Wave-domain adaptive    filtering: acoustic echo cancellation for full duplex systems based    on wave-field synthesis”, in IEEE International Conference on    Acoustics, Speech, and Signal Processing (ICASSP), volume 4,    IV-117-IV-120 (Montreal, Canada) (2004).-   [14] H. Buchner, J. Benesty, and W. Kellermann, “Multichannel    frequency-domain adaptive algorithms with application to acoustic    echo cancellation”, in Adaptive Signal Processing: Application to    Real-World Problems, edited by J. Benesty and Y. Huang (Springer,    Berlin) (2003).-   [15] H. Buchner and S. Spors, “A general derivation of wave-domain    adaptive filtering and application to acoustic echo cancellation”,    in Asilomar Conference on Signals, Systems, and Computers, 816-823    (2008).-   [16] Y. Huang, J. Benesty, and J. Chen, Acoustic MIMO Signal    Processing (Springer, Berlin) (2006).-   [17] C. Breining, P. Dreiseitel, E. Hinsler, A. Mader, B. Nitsch, H.    Puder, T. Schertler, G. Schmidt, and J. Tilp, “Acoustic echo    control: An application of very-high-order adaptive filters”, IEEE    Signal Process. Mag. 16, 42-69 (1999).-   [18] S. Spors, H. Buchner, R. Rabenstein, and W. Herbordt, “Active    listening room compensation for massive multichannel sound    reproduction systems using wave-domain adaptive filtering”, J.    Acoust. Soc. Am. 122, 354-369 (2007).-   [19] H. Teutsch, Modal Array Signal Processing: Principles and    Applications of Acoustic Wavefield Decomposition (Springer, Berlin)    (2007).-   [20] P. Morse and H. Feshbach, Methods of Theoretical Physics    (McGraw-Hill, New York) (1953).-   [21] C. Balanis, Antenna Theory (Wiley, New York) (1997).-   [22] M. Abramovitz and I. Stegun, Handbook of Mathematical Functions    (Dover, New York) (1972).-   [23] M. Schneider and W. Kellermann, “A wave-domain model for    acoustic MIMO systems with reduced complexity”, in Third Joint    Workshop on Hands-free Speech Communication and Microphone Arrays    (HSCMA) (Edinburgh, UK) (2011).-   [24] H. Buchner, J. Benesty, T. Gänsler, and W. Kellermann, “Robust    Extended Multidelay Filter and Double-Talk Detector for Acoustic    Echo Cancellation”, IEEE Trans. Audio, Speech, Language Process. 14,    1633-1644 (2006).-   [25] S. Goetze, M. Kallinger, A. Mertins, and K. D. Kammeyer,    “Multichannel listening-room compensation using a decoupled    filtered-X LMS algorithm,” in Proc. Asilomar Conference on Signals,    Systems, and Computers, October 2008, pp. 811-815.-   [26] O. Kirkeby, P. A. Nelson, H. Hamada, and F. Orduna-Bustamante,    “Fast deconvolution of multichannel systems using regularization,”    Speech and Audio Processing, IEEE Transactions on, vol. 6, no. 2,    pp. 189-194, March 1998.-   [27] Spors, S.; Buchner, H.; Rabenstein, R.: A novel approach to    active listening room compensation for wave field synthesis using    wave-domain adaptive filtering. In: Proc. Int. Conf. Acoust.,    Speech, Signal Process. (ICASSP) Bd. 4, 2004.—ISSN 1520-6149, S.    IV-29-IV-32.-   [28] Spors, S.; Buchner, H.: Efficient massive multichannel active    noise control using wave-domain adaptive filtering. In:    Communications, Control and Signal Processing, 2008. ISCCSP 2008.    3rd International Symposium on IEEE, 2008, S. 1480-1485.

The invention claimed is:
 1. An apparatus for providing a currentloudspeaker-enclosure-microphone system description of aloudspeaker-enclosure-microphone system, wherein theloudspeaker-enclosure-microphone system comprises a plurality ofloudspeakers and a plurality of microphones, and wherein the apparatuscomprises: a first transformation unit for generating a plurality ofwave-domain loudspeaker audio signals, wherein the first transformationunit is configured to generate each of the wave-domain loudspeaker audiosignals based on a plurality of time-domain loudspeaker audio signalsand based on one or more of a plurality ofloudspeaker-signal-transformation values, said one or more of theplurality of loudspeaker-signal-transformation values being assigned tosaid generated wave-domain loudspeaker audio signal, a secondtransformation unit for generating a plurality of wave-domain microphoneaudio signals, wherein the second transformation unit is configured togenerate each of the wave-domain microphone audio signals based on aplurality of time-domain microphone audio signals and based on one ormore of a plurality of microphone-signal-transformation values, said oneor more of the plurality of microphone-signal-transformation valuesbeing assigned to said generated wave-domain loudspeaker audio signal,and a system description generator for generating the currentloudspeaker-enclosure-microphone system description based on theplurality of wave-domain loudspeaker audio signals, and based on theplurality of wave-domain microphone audio signals, wherein the systemdescription generator is configured to generate the currentloudspeaker-enclosure-microphone system description based on a pluralityof coupling values, wherein each of the plurality of coupling values isassigned to one of a plurality of wave-domain pairs, each of theplurality of wave-domain pairs being a pair of one of the plurality ofloudspeaker-signal-transformation values and one of the plurality ofmicrophone-signal-transformation values, wherein the system descriptiongenerator is configured to determine each coupling value assigned to awave-domain pair of the plurality of wave-domain pairs by determiningfor said wave-domain pair at least one relation indicator indicating arelation between said one of the one or moreloudspeaker-signal-transformation values of said wave-domain pair andsaid one of the microphone-signal-transformation values of saidwave-domain pair to generate the currentloudspeaker-enclosure-microphone system description.
 2. The apparatusaccording to claim 1, wherein the system description generator comprisesa system description application unit, an error determiner and a systemdescription generation unit, wherein the system description applicationunit is configured to generate a plurality of wave-domain microphoneestimation signals based on the wave-domain loudspeaker audio signalsand based on a previous loudspeaker-enclosure-microphone systemdescription of the loudspeaker-enclosure-microphone system, wherein theerror determiner is configured to determine a plurality of wave-domainerror signals based on the plurality of wave-domain microphone audiosignals and based on the plurality of wave-domain microphone estimationsignals, wherein the system description generation unit is configured togenerate the current loudspeaker-enclosure-microphone system descriptionbased on the wave-domain loudspeaker audio signals, based on theplurality of error signals and based on the plurality of couplingvalues.
 3. The apparatus according to claim 2, wherein the firsttransformation unit is configured to generate each of the wave-domainloudspeaker audio signals based on the plurality of time-domainloudspeaker audio signals and based on the one or more of the pluralityof loudspeaker-signal-transformation values, wherein the plurality ofloudspeaker-signal-transformation values is a plurality ofloudspeaker-signal-transformation mode orders, wherein the secondtransformation unit is configured to generate each of the wave-domainmicrophone audio signals based on the plurality of time-domainmicrophone audio signals and based on the one or more of the pluralityof microphone-signal-transformation values, wherein the plurality ofmicrophone-signal-transformation values is a plurality ofmicrophone-signal-transformation mode orders, and wherein the systemdescription generation unit is configured to generate theloudspeaker-enclosure-microphone system description based on a firstcoupling value of the plurality of coupling values, when a firstrelation value indicating a first difference between a firstloudspeaker-signal-transformation mode order of the plurality ofloudspeaker-signal-transformation mode orders and a firstmicrophone-signal-transformation mode order of the plurality ofmicrophone-signal mode orders comprises a first difference value,wherein the system description generation unit is configured to assignthe first coupling value to a first wave-domain pair of the plurality ofwave-domain pairs, when the first relation value comprises the firstdifference value, wherein the first wave-domain pair is a pair of thefirst loudspeaker-signal-transformation mode order and the firstmicrophone-signal-transformation mode order, and wherein the firstrelation value is one of the plurality of relation indicators, andwherein the system description generation unit is configured to generatethe loudspeaker-enclosure-microphone system description based on asecond coupling value of the plurality of coupling values, when a secondrelation value indicating a second difference between a secondloudspeaker-signal-transformation mode order of the plurality ofloudspeaker-signal-transformation mode orders and a secondmicrophone-signal-transformation mode order of the plurality ofmicrophone-signal-transformation mode orders comprises a seconddifference value, being different from the first difference value,wherein the system description generation unit is configured to assignthe second coupling value to the second wave-domain pair of theplurality of wave-domain pairs, when the second relation value comprisesthe second difference value, wherein the second wave-domain pair is apair of the second loudspeaker-signal-transformation mode order of theplurality of loudspeaker-signal-transformation mode orders and thesecond microphone-signal-transformation mode order of the plurality ofmicrophone-signal-transformation mode orders, wherein the secondwave-domain pair is different from the first wave-domain pair, andwherein the second relation value is one of the plurality of relationindicators.
 4. The apparatus according to claim 3, wherein the systemdescription generation unit is configured to generate the currentloudspeaker-enclosure-microphone system description based on the firstcoupling value of the first wave-domain pair, when the firstloudspeaker-signal-transformation mode order is equal to the firstmicrophone-signal-transformation mode order, and wherein the systemdescription generation unit is configured to generate the currentloudspeaker-enclosure-microphone system description based on the secondcoupling value of the second wave-domain pair, when the secondloudspeaker-signal-transformation mode order is not equal to the secondmicrophone-signal-transformation mode order.
 5. The apparatus accordingto claim 3, wherein the system description generation unit is configuredto generate the current loudspeaker-enclosure-microphone systemdescription based on the first coupling value of the first wave-domainpair, when the first loudspeaker-signal-transformation mode order isequal to the first microphone-signal-transformation mode order, whereinthe system description generation unit is configured to generate thecurrent loudspeaker-enclosure-microphone system description based on thesecond coupling value of the second wave-domain pair, when the secondloudspeaker-signal-transformation mode order is not equal to the secondmicrophone-signal-transformation mode order, and when the absolutedifference between the second loudspeaker-signal-transformation modeorder and the second microphone-signal-transformation mode order issmaller than or equal to a predefined threshold value, and wherein thesystem description generation unit is configured to generate the currentloudspeaker-enclosure-microphone system description based on a thirdcoupling value of a third wave-domain pair being a pair of a thirdloudspeaker-signal-transformation mode order of the plurality ofloudspeaker-signal-transformation mode orders and a thirdmicrophone-signal-transformation mode order of the plurality ofmicrophone-signal-transformation mode orders, when the thirdloudspeaker-signal-transformation mode order is not equal to the thirdmicrophone-signal-transformation mode order, and when an absolutedifference between the third loudspeaker-signal-transformation modeorder and the third microphone-signal-transformation mode order isgreater than the predefined threshold value.
 6. The apparatus accordingto claim 5, wherein the first coupling value is a first number β₁,wherein the second coupling value is a second value β₂, wherein0<β₁<β₂≤1 0≤β₁<β2≤1, and wherein the third coupling value is 1.0.
 7. Theapparatus according to claim 3, wherein the system descriptiongeneration unit is configured to generate a currentloudspeaker-enclosure-microphone system description matrix based on aprevious loudspeaker-enclosure-microphone system description matrix,wherein the previous loudspeaker-enclosure-microphone system descriptionmatrix represents the previous loudspeaker-enclosure-microphone systemdescription, and wherein the current loudspeaker-enclosure-microphonesystem description matrix represents the currentloudspeaker-enclosure-microphone system description.
 8. The apparatusaccording to claim 7, wherein the system description generation unit isconfigured to generate the current loudspeaker-enclosure-microphonesystem description matrix based on the previousloudspeaker-enclosure-microphone system description matrix, wherein thecurrent loudspeaker-enclosure-microphone system description matrixcomprises a plurality of current matrix components {tilde over (h)}_(m)(n) {tilde over (h)} _(m)(n), wherein the previousloudspeaker-enclosure-microphone system description matrix comprises aplurality of previous matrix components {tilde over (h)} _(m)(n) {tildeover (h)} _(m)(n−1), and wherein the system description generation unitis configured to determine the current matrix components {tilde over(h)} _(m)(n) {tilde over (h)} _(m)(n) according to the formula{tilde over (h)} _(m)(n)={tilde over (h)} _(m)(n−1)+(1−λ_(a))(S(n)+C_(m)(n))⁻¹·(W ₁₀ ^(H) X ^(H)(n)W ₀₁ ^(H) {tilde over (e)} _(m)(n)−C_(m)(n){tilde over (h)} _(m)(n−1)) {tilde over (h)} _(m)(n)={tilde over(h)} _(m)(n−1)+(1−λ_(a))(S(n)+C _(m)(n))⁻¹·(W ₁₀ ^(H) X ^(H)(n)W ₀₁ ^(H){tilde over (e)} _(m)(n)−C _(m)(n){tilde over (h)} _(m)(n−1)), wherein C_(m)(n) C _(m)(n) is a coupling matrix, comprising a plurality ofcoupling matrix coefficients, wherein X ^(H)(n) X ^(H)(n) is theconjugate transpose matrix of loudspeaker signal matrix X(n) X(n),wherein X(n) X(n) is a loudspeaker signal matrix depending on theplurality of wave-domain loudspeaker audio signals, wherein W₀₁ W₀₁ is afirst windowing matrix for time-domain windowing, wherein W₁₀ W₁₀ is asecond windowing matrix for time-domain windowing, and wherein thesystem description generation unit is configured to determine the matrixS(n) S(n) according to the formulaS(n)=λ_(a) S(n−1)+(1−λ_(a))W ₁₀ ^(H) X ^(H)(n)W ₀₁ ^(H) W ₀₁ X(n)W ₁₀,wherein λ_(a) is a number, wherein 0≤λ_(a)<1.
 9. The apparatus accordingto claim 8, wherein the weighting function ω_(c) is defined by theformula${{w_{c}(n)} = \frac{\sum\limits_{m = 0}^{N_{M} - 1}{J_{m}\left( {n - 1} \right)}}{\max\left\{ {{\sum\limits_{m = 0}^{N_{M} - 1}{{{\underset{\_}{\overset{\sim}{h}}}_{m}^{H}\left( {n - 1} \right)}{\underset{\_}{\overset{\sim}{h}}\left( {n - 1} \right)}}},1} \right\}}},{wherein}$${{J_{m}(n)} = {\left( {1 - \lambda_{a}} \right){\sum\limits_{i = 0}^{n}{\lambda_{a}^{n - i}{{\underset{\_}{\overset{\sim}{e}}}_{m}^{H}(i)}{{\underset{\_}{\overset{\sim}{e}}}_{m}(i)}}}}},$wherein {tilde over (e)} _(m) ^(H)(i) represents the conjugate transposeof {tilde over (e)} _(m) ^(H)(i), and wherein {tilde over (e)} _(m)^(H)(i) indicates one of the plurality of error signals.
 10. Theapparatus according to claim 8, wherein the coupling matrix C _(m)(n) isdefined by the formulaC _(m)(n)=β₀ω_(c)(n)Diag{c₀(n),c₁(n), . . . ,c_(N) _(L) _(L) _(H)⁻¹(n)}, wherein Diag{c₀(n), c₁(n), . . . c_(N) _(L) _(L) _(H) ⁻¹(n)}indicates a diagonal matrix, wherein c₀(n) is the first coupling valueor the second coupling value indicated by the coupling information oranother coupling value, being different from the first and the secondcoupling value, and being indicated by the coupling information, whereinc₁(n) is the first coupling value or the second coupling value indicatedby the coupling information or another coupling value, being differentfrom the first and the second coupling value, and being indicated by thecoupling information, wherein c_(N) _(L) _(L) _(H) ⁻¹(n) is the firstcoupling value or the second coupling value indicated by the couplinginformation or another coupling value, being different from the firstand the second coupling value, and being indicated by the couplinginformation, wherein β₀ is a scale parameter, wherein 0≤β₀, whereinω_(c)(n) is a weighting function returning a number which is greaterthan 0, and wherein n is a time index.
 11. The apparatus according toclaim 10, wherein the system description generation unit is configuredto determine the coupling matrix C _(m)(n) defined by the formulaC _(m)(n)=β₀ω_(c)(n)Diag{c₀(n),c₁(n), . . . ,c_(N) _(L) _(L) _(H)⁻¹(n)}, wherein c₀(n), c₁(n), . . . , c_(N) _(L) _(L) _(H) ⁻¹(n) aredefined by: $\begin{matrix}{{c_{q}(n)} = \left\{ \begin{matrix}\beta_{1} & {{{{when}\mspace{14mu}\Delta\;{m(q)}} = 0},} \\\beta_{2} & {{{{when}\mspace{14mu}\Delta\;{m(q)}} = 1},} \\1 & {{elsewhere},}\end{matrix} \right.} & (60)\end{matrix}$ wherein 0≤β₁<β₂≤1, wherein β₁ is the first coupling value,wherein β₂ is the second coupling value, wherein q indicates the firstwave-domain pair, the second wave-domain pair or a different wave-domainpair of one of the plurality of loudspeaker-signal-transformation modeorders and one of the plurality of microphone-signal-transformation modeorders, and wherein Δm(q) is a relation indicator of said wave-domainpair q, wherein Δm(q) indicates a difference between theloudspeaker-signal-transformation mode order of said wave-domain pair qand the microphone-signal-transformation mode order of said wave-domainpair q.
 12. The apparatus according to claim 11, wherein Δm(q) isdefined by the formula:Δm(q)=min(|└q/L_(H)┘−m|,|└q/L_(H)┘−m−N_(L))Δm(q)=min(|└q/L_(H)┘−m|,|└q/L_(H)┘−m−N_(L)), wherein m indicates one ofthe plurality of microphone-signal-transformation mode orders, whereinN_(L) indicates the number of loudspeakers of the loudspeaker enclosuremicrophone system, and wherein L_(H) indicates a length of thediscrete-time impulse response of the loudspeaker-enclosure-microphonesystem from one of the plurality of loudspeakers of theloudspeaker-enclosure-microphone system to one of the microphones of theloudspeaker-enclosure-microphone system.
 13. The apparatus according toclaim 3, wherein the first transformation unit is configured to generatethe plurality of wave-domain loudspeaker audio signals by employing theformula$\sum\limits_{\lambda = 0}^{N_{L} - 1}{{{\hat{P}}_{\lambda}^{(x)}\left( {j\;\omega} \right)}e^{{- j}\; l^{\prime}\lambda\;\frac{2\pi}{N_{L}}}}$wherein N_(L) indicates the number of loudspeakers of theloudspeaker-enclosure-microphone system, wherein l′ indicates one of theplurality of loudspeaker-signal-transformation mode orders, and wherein{circumflex over (P)}_(λ) ^((x))(jω) indicates a spectrum of a soundfield emitted by loudspeaker λ.
 14. The apparatus according to claim 3,wherein the second transformation unit is configured to generate theplurality of wave-domain microphone audio signals by employing theformula$\sum\limits_{\mu = 0}^{N_{M} - 1}{{{\hat{P}}_{\mu}^{(x)}\left( {j\;\omega} \right)}e^{{- j}\; m^{\prime}\mu\;\frac{2\pi}{M}}}$wherein N_(M) indicates the number of microphones of theloudspeaker-enclosure-microphone system, wherein m′ indicates one of theplurality of microphone-signal-transformation mode orders, and wherein{circumflex over (P)}_(μ) ^((d))(jω) indicates a spectrum of a soundpressure measured by microphone μ.
 15. A system, comprising: a pluralityof loudspeakers of a loudspeaker-enclosure-microphone system, aplurality of microphones of the loudspeaker-enclosure-microphone system,and an apparatus according to claim 1, wherein the plurality ofloudspeakers are arranged to receive a plurality of loudspeaker inputsignals, wherein the apparatus according to claim 1 is arranged toreceive the plurality of loudspeaker input signals, wherein theplurality of microphones are configured to record a plurality ofmicrophone input signals, wherein the apparatus according to claim 1 isarranged to receive the plurality of microphone input signals, andwherein the apparatus according to claim 1 is configured to adjust aloudspeaker-enclosure-microphone system description based on thereceived loudspeaker input signals and based on the received microphoneinput signals.
 16. A system for generating filtered loudspeaker signalsfor a plurality of loudspeakers of a loudspeaker-enclosure-microphonesystem, wherein the system comprises: a filter unit, and an apparatusaccording to claim 1, wherein the apparatus according to claim 1 isconfigured to provide a current loudspeaker-enclosure-microphone systemdescription of the loudspeaker-enclosure-microphone system to the filterunit, wherein the filter unit is configured to adjust a loudspeakersignal filter based on the current loudspeaker-enclosure-microphonesystem description to achieve an adjusted filter, wherein the filterunit is arranged to receive a plurality of loudspeaker input signals,and wherein the filter unit is configured to filter the plurality ofloudspeaker input signals by applying the adjusted filter on theloudspeaker input signals to acquire the filtered loudspeaker signals.17. A method for providing a current loudspeaker-enclosure-microphonesystem description of a loudspeaker-enclosure-microphone system, whereinthe loudspeaker-enclosure-microphone system comprises a plurality ofloudspeakers and a plurality of microphones, and wherein the methodcomprises: generating a plurality of wave-domain loudspeaker audiosignals by generating each of the wave-domain loudspeaker audio signalsbased on a plurality of time-domain loudspeaker audio signals and basedon one or more of a plurality of loudspeaker-signal-transformationvalues, said one or more of the plurality ofloudspeaker-signal-transformation values being assigned to saidgenerated wave-domain loudspeaker audio signal, generating a pluralityof wave-domain microphone audio signals by generating each of thewave-domain microphone audio signals based on a plurality of time-domainmicrophone audio signals and based on one or more of a plurality ofmicrophone-signal-transformation values, said one or more of theplurality of microphone-signal-transformation values being assigned tosaid generated wave-domain loudspeaker audio signal, and generating thecurrent loudspeaker-enclosure-microphone system description based on theplurality of wave-domain loudspeaker audio signals, and based on theplurality of wave-domain microphone audio signals, wherein the currentloudspeaker-enclosure-microphone system description is generated basedon a plurality of coupling values, wherein each of the plurality ofcoupling values is assigned to one of a plurality of wave-domain pairs,each of the plurality of wave-domain pairs being a pair of one of theplurality of loudspeaker-signal-transformation values and one of theplurality of microphone-signal-transformation values, wherein eachcoupling value assigned to a wave-domain pair of the plurality ofwave-domain pairs is determined by determining for said wave-domain pairat least one relation indicator indicating a relation between said oneof the one or more loudspeaker-signal-transformation values of saidwave-domain pair and said one of the microphone-signal-transformationvalues of said wave-domain pair to generate the currentloudspeaker-enclosure-microphone system description.
 18. A method fordetermining at least two filter configurations of a loudspeaker signalfilter for at least two different loudspeaker-enclosure-microphonesystem states, wherein the loudspeaker signal filter is arranged tofilter a plurality of loudspeaker input signals to acquire a pluralityof filtered loudspeaker signals for steering a plurality of loudspeakersof a loudspeaker-enclosure-microphone system, wherein the methodcomprises: determining a first loudspeaker-enclosure-microphone systemdescription of a loudspeaker-enclosure-microphone system according tothe method of claim 17, when the loudspeaker-enclosure-microphone systemcomprises a first state, determining a first filter configuration of theloudspeaker signal filter based on the firstloudspeaker-enclosure-microphone system description, storing the firstfilter configuration in a memory, determining a secondloudspeaker-enclosure-microphone system description of theloudspeaker-enclosure-microphone system according to the method of claim17, when the loudspeaker-enclosure-microphone system second comprises asecond state, determining a second filter configuration of theloudspeaker signal filter based on the secondloudspeaker-enclosure-microphone system description, and storing thesecond filter configuration in the memory.
 19. A non-transitorycomputer-readable medium comprising a computer program for implementingwhen being executed by a computer or processor a method for providing acurrent loudspeaker-enclosure-microphone system description of aloudspeaker-enclosure-microphone system, wherein theloudspeaker-enclosure-microphone system comprises a plurality ofloudspeakers and a plurality of microphones, and wherein the methodcomprises: generating a plurality of wave-domain loudspeaker audiosignals by generating each of the wave-domain loudspeaker audio signalsbased on a plurality of time-domain loudspeaker audio signals and basedon one or more of a plurality of loudspeaker-signal-transformationvalues, said one or more of the plurality ofloudspeaker-signal-transformation values being assigned to saidgenerated wave-domain loudspeaker audio signal, and generating aplurality of wave-domain microphone audio signals by generating each ofthe wave-domain microphone audio signals based on a plurality oftime-domain microphone audio signals and based on one or more of aplurality of microphone-signal-transformation values, and generating thecurrent loudspeaker-enclosure-microphone system description based on theplurality of wave-domain loudspeaker audio signals, and based on theplurality of wave-domain microphone audio signals, wherein the currentloudspeaker-enclosure-microphone system description is generated basedon a plurality of coupling values, wherein each of the plurality ofcoupling values is assigned to one of a plurality of wave-domain pairs,each of the plurality of wave-domain pairs being a pair of one of theplurality of loudspeaker-signal-transformation values and one of theplurality of microphone-signal-transformation values, wherein eachcoupling value assigned to a wave-domain pair of the plurality ofwave-domain pairs is determined by determining for said wave-domain pairat least one relation indicator indicating a relation between said oneof the one or more loudspeaker-signal-transformation values of saidwave-domain pair and said one of the microphone-signal-transformationvalues of said wave-domain pair to generate theloudspeaker-enclosure-microphone system description.
 20. Anon-transitory computer-readable medium comprising a computer programfor implementing when being executed by a computer or processor a methodfor determining at least two filter configurations of a loudspeakersignal filter for at least two differentloudspeaker-enclosure-microphone system states, wherein the loudspeakersignal filter is arranged to filter a plurality of loudspeaker inputsignals to acquire a plurality of filtered loudspeaker signals forsteering a plurality of loudspeakers of aloudspeaker-enclosure-microphone system, wherein the method comprises:determining a first loudspeaker-enclosure-microphone system descriptionof a loudspeaker-enclosure-microphone system according to the method ofclaim 17, when the loudspeaker-enclosure-microphone system comprises afirst state, determining a first filter configuration of the loudspeakersignal filter based on the first loudspeaker-enclosure-microphone systemdescription, storing the first filter configuration in a memory,determining a second loudspeaker-enclosure-microphone system descriptionof the loudspeaker-enclosure-microphone system according to the methodof claim 17, when the loudspeaker-enclosure-microphone system secondcomprises a second state, determining a second filter configuration ofthe loudspeaker signal filter based on the secondloudspeaker-enclosure-microphone system description, and storing thesecond filter configuration in the memory.